Methods and systems for predicting crop features and evaluating inputs and practices

ABSTRACT

A method and system for evaluating and predicting a set of crop-associated features at an agriculture site, the method comprising: receiving a set of samples associated with the agriculture site; generating a sample dataset upon processing the set of samples with a set of sample processing operations; generating a set of microbiome-associated features upon performing a set of transformation operations upon the sample dataset; and returning an analysis characterizing the set of crop-associated features based upon the set of microbiome-associated features. The method can further include steps for executing an action for producing a desired outcome in relation to the agriculture site, with respect to a specific soil type and a specific crop, based upon the analysis.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No.63/143,159 filed on 29 Jan. 2021, U.S. Provisional Application No.63/143,534 filed on 29 Jan. 2021, and U.S. Provisional Application No.63/143,600 filed on 29 Jan. 2021, which are each incorporated in itsentirety herein by this reference.

FIELD OF THE INVENTION

The disclosure generally relates to tools and systems implementingmethods for sampling and characterizing agricultural sites.

BACKGROUND

Agriculture ecosystems are human-managed ecosystems subject to variousecological rules, in relation to steady state scenarios and in responseto various perturbations. Understanding the ecological mechanisms behindsoil microbial communities is a fruitful way to improve managementpractices, test various products, agricultural practices and/or otheragricultural inputs, evaluate sustainability, and therefore improveagriculture site productivity. Acquisition and processing of theappropriate data from agriculture-associated samples, development ofmodels for characterization of ecosystem statuses, and generation ofoutputs and implementation of actions for maintaining such ecosystems,improving yields, improving crop nutrient content, improving soil carbonsequestration characteristics, and/or improving produce shelf-life in asustainable manner are all areas of innovation in which the inventionsdescribed herein provide value.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 depicts an embodiment of a workflow of a method for evaluatingand predicting crop features.

FIG. 2: depicts an embodiment of a portion of a workflow of a method forevaluating and predicting crop features.

FIG. 3 depicts an embodiment of a portion of a workflow of a method forevaluating and predicting crop features with respect to training andrefining developed models.

FIGS. 4A-4D depict variations of outputs of a method for evaluating andpredicting crop features.

FIG. 5 depicts outputs related to relative abundance of taxonomic groupsat various time points in relation to inputs associated with a methodfor evaluating and predicting crop features.

FIGS. 6A-6B depict outputs related to importance in yield predictions ofa method for evaluating and predicting crop features.

FIGS. 7A-7C depict outputs of a method for evaluating and predictingcrop features.

FIGS. 8A-8K depict outputs of stages of a method for evaluating andpredicting crop features (e.g., yield, nutritional data).

FIG. 9 depict outputs of a method for evaluating and predicting cropfeatures in response to applied treatments.

FIGS. 10A-10B depict variations of a method for evaluating agriculturalinputs and practices.

FIG. 11 depicts a schematic of an embodiment of a system for evaluatingand predicting crop features.

DETAILED DESCRIPTION OF THE INVENTION(S)

The following description of the preferred embodiments of the inventionis not intended to limit the invention to these preferred embodiments,but rather to enable any person skilled in the art to make and use thisinvention.

1. Benefits

The invention(s) described can confer several benefits over conventionalsystems, methods, and compositions.

The invention(s) provide systems and methods for prediction of variousagriculture site and crop features, which are useful in downstreamapplications in relation to recommending or implementing variousagriculture inputs and/or management practices to improve productivityor maintain health of the agriculture site.

Additionally, in embodiments, the invention(s) described implement rapidprocessing of samples and analysis of data generated from sampleprocessing, in order to extract insights related to predicted featuresof crops and agriculture sites (e.g., yield, nutritional composition,etc.), in a manner that cannot be practically performed by the humanmind.

Additionally, in embodiments, the invention(s) provide methods fordetermining microbiome-associated or -derived properties and functions,and/or properties and functions derived from network properties in localmicrobial, fungal, and/or other organism communities, and to use them toassess the impact of different agricultural inputs and/or practices(e.g., farming practices).

Additionally, in embodiments, The invention(s) can further providemethods and systems for evaluating, guiding, and/or executingimplementation of various agricultural inputs and/or managementpractices for enhancement of yield (e.g., in relation to specific soiltypes and/or for specific crops) and/or improvement of agriculture sitecharacteristics (e.g., with respect to health, with respect tosustainability).

In variations, the invention(s) described returned outputs forevaluating the effectiveness and potential mechanism of action ofbiostimulants with respect to different soil profiles and specific cropsto generate precise product recommendations and implementation ofinterventions based on local conditions, for increasing crop yield. In aspecific example, the invention(s) processed bulk soil and rhizospheresoil samples to determine microbial composition and structure, forevaluation of the effect of a Bacillus amyloliquefaciens strain QST713inoculant on potato crops, with applications in improving crop yield.With application, the QST713 inoculant applied as a treatment accordingto the method was found to have a significant effect on yield throughmodulation of the structure of fungal and bacterial communities (e.g.,measured using co-occurrence and co-exclusion networks), without causinga detectable long-lasting effect on the alpha- and beta-diversitypatterns after harvest.

In embodiments, the method(s) promote agro-ecosystem sustainabilitythrough assessment of soil organism communities. In particular, thecomplexity of microbial communities, at both taxonomic and functionallevels, is impossible to assess practically without systems and methodsdescribed herein, where the methods cannot be practically implemented bythe human mind. The invention(s) thus process samples to extractpatterns connecting sample microbiome composition with ecosystemfunction in order to drive interventions based upon the impact of biotic(e.g., interspecies interactions, intraspecies interactions) and abiotic(e.g. climate or anthropogenic disturbances) factors. As such, theinvention(s) provide a new methodological framework—inferring emergentproperties from local networks—with assessment and guidance of differentecological strategies in agricultural site communities. In practicalapplications, the methods can be used to restore soil functionality,predict yields, manage crop vulnerabilities, optimize their farmingpractices, and improve the sustainability of agricultural sites.Additionally or alternatively, the inventions can guide or informmanagement practices in relation to effects on soil carbonsequestration.

Additionally, the inventions described provide systems and a platformincluding architecture for agriculture sample extraction and processing,which provide improved tools for monitoring, forecasting, and respondingto events (e.g., changes in productivity, events associated withmanagement practices, environmental perturbations, product-inducedperturbations, etc.) associated with one or more agricultural sites.Additionally or alternatively, the inventions can assess implementationof a plant variety and/or a seed variety at an agriculture site.

Additionally, the inventions apply outputs of the analyses to effect oneor more actions (e.g., treatments) to maintain or improve the naturalecological site conditions, thereby providing practical applications ofthe method(s) and models involved.

Additionally, the inventions involve collection of samples from variousagricultural sites, processing of samples to extract data features,application of one or more transformations to the data features togenerate modified digital objects, create improved training data setsfor machine learning/classification algorithms, and iteratively trainthe machine learning/classification algorithms, such that agriculturesite statuses can be returned upon processing subsequent sampleshitherto unseen by the algorithm.

In applications, the inventions can contribute to significantlyincreased yields of major/important crops (e.g., rice, wheat, soybeans,maize, potatoes, etc.) to improve global food production in relation toanticipated world population increases. Taking into account the effectsof human intervention on soil ecology, the inventions can providerecommendations (management, treatment, etc.) that increase yieldpreserving ecology. In particular, using potato crops as an example,applications of the inventions can characterize yield (e.g., maximumpotential yield) of potato crops based on current inputs and managementpractices, and/or recommend or implement agricultural inputs andimproved practices for enhancement of yield and/or agriculture sitecharacteristics.

Additionally or alternatively, the invention(s) can confer any othersuitable benefit in any crop.

1.1 Definitions

The terms microbiome, microbiome information, microbiome data,microbiome population, microbiome panel and similar terms are used inthe broadest possible sense, unless expressly stated otherwise, andwould include: a census of currently present microorganisms, both livingand non-living, which may have been present months, years, millennia orlonger; a census of components of the microbiome other than bacteria andarchaea (e.g., viruses, microbial eukaryotes, etc.); population studiesand characterizations of microorganisms, genetic material, and biologicmaterial; a census of any detectable biological material; andinformation that is derived or ascertained from genetic material,biomolecular makeup, fragments of genetic material, DNA, RNA, protein,carbohydrate, metabolite profile, fragment of biological materials andcombinations and variations of these.

As used herein, the terms real-time microbiome data or informationincludes microbiome information that is collected or obtained at aparticular setting or stage of an agricultural process for one or moreagricultural sites.

As used herein, the terms derived microbiome information and derivedmicrobiome data are to be given their broadest possible meaning, unlessspecified otherwise, and includes any real-time, microbiome informationthat has been computationally linked or used to create a relationship.

As used herein, the terms predictive microbiome information andpredictive microbiome data are to be given their broadest possiblemeaning, unless specified otherwise, and includes information that isbased upon combinations and computational links or processing ofhistoric, predictive, real-time, and derived microbiome information,data, and combinations, variations and derivatives of these, whichinformation predicts, forecasts, directs, or anticipates a futureoccurrence, event, state, or condition in the industrial setting, orallows interpretation of a current or past occurrence.

Real time, derived, and predicted data can be collected and stored, andthus, become historic data for ongoing or future decision-making for aprocess, setting, or application.

“Nucleic acid,” “oligonucleotide,” and “polynucleotide” refer todeoxyribonucleic acids (DNA) or ribonucleic acids (RNA) and polymersthereof in either single- or double-stranded form. Unless specificallylimited, the term encompasses nucleic acids containing known analoguesof natural nucleotides that have similar binding properties as thereference nucleic acid and are metabolized in a manner similar tonaturally occurring nucleotides. The term nucleic acid is usedinterchangeably with gene, cDNA, and mRNA encoded by a gene.

The term “microbiome”, as used herein, refers to the ecologicalcommunity of commensal, symbiotic, or pathogenic microorganisms in asample.

The term “genome” as used herein, refers to the entirety of anorganism's hereditary information that is encoded in its primary DNAsequence. The genome includes both the genes and the non-codingsequences. For example, the genome may represent a microbial genome or amammalian genome.

Reference to “DNA region” should be understood as a reference to aspecific section of genomic DNA. These DNA regions are specified eitherby reference to a gene name or a set of chromosomal coordinates. Boththe gene names and the chromosomal coordinates would be well known to,and understood by, the person of skill in the art. In general, a genecan be routinely identified by reference to its name, via which both itssequences and chromosomal location can be routinely obtained, or byreference to its chromosomal coordinates, via which both the gene nameand its sequence can also be routinely obtained.

Reference to each of the genes/DNA regions detailed above should beunderstood as a reference to all forms of these molecules and tofragments or variants thereof. As would be appreciated by the person ofskill in the art, some genes are known to exhibit allelic variation orsingle nucleotide polymorphisms. SNPs encompass insertions and deletionsof varying size and simple sequence repeats, such as dinucleotide andtrinucleotide repeats. Variants include nucleic acid sequences from thesame region sharing at least 90%, 95%, 98%, 99% sequence identity i.e.having one or more deletions, additions, substitutions, invertedsequences etc. relative to the DNA regions described herein.Accordingly, the present invention should be understood to extend tosuch variants which, in terms of the present applications, achieve thesame outcome despite the fact that minor genetic variations between theactual nucleic acid sequences may exist between different bacterialstrains. The present invention should therefore be understood to extendto all forms of DNA which arise from any other mutation, polymorphic orallelic variation.

Genetic sequences and/or fragments thereof can be targets of interest.Targets can additionally or alternatively include other nucleic acids(e.g., DNAs, RNAs), amino acids, proteins, other molecules, chemicals,other analytes, or other suitable material.

The term “sequencing” as used herein refers to sequencing methods fordetermining the order of the nucleotide bases—adenine, guanine,cytosine, and thymine—in a nucleic acid molecule (e.g., a DNA or RNAnucleic acid molecule).

The term “barcode” as used herein, refers to any unique, non-naturallyoccurring, nucleic acid sequence that may be used to identify theoriginating source of a nucleic acid fragment.

A “computer-readable medium”, is an information storage medium that canbe accessed by a computer using a commercially available or custom-madeinterface. Exemplary computer-readable media include memory (e.g., RAM,ROM, flash memory, etc.), optical storage media (e.g., CD-ROM), magneticstorage media (e.g., computer hard drives, floppy disks, etc.), punchcards, or other commercially available media. Information may betransferred between a system of interest and a medium, betweencomputers, or between computers and the computer-readable medium forstorage or access of stored information. Such transmission can beelectrical, or by other available methods, such as IR links, wirelessconnections, etc.

Where a range of values is provided, it is understood that eachintervening value, between the upper and lower limit of that range andany other stated or intervening value in that stated range isencompassed within the invention. The upper and lower limits of thesesmaller ranges may independently be included in the smaller ranges, andare also encompassed within the invention, subject to any specificallyexcluded limit in the stated range. Where the stated range includes oneor both of the limits, ranges excluding either both of those includedlimits are also included in the invention.

2. Methods

As shown in FIG. 1, an embodiment of a method 100 for evaluating andpredicting a set of crop-associated features at an agriculture siteincludes: receiving a set of samples from an agriculture site or inassociation with an agricultural process S110 (e.g., associated with anagricultural process at the agriculture site); generating a sampledataset upon processing the set of samples with a set of sampleprocessing operations S120; generating a set of microbiome-associatedfeatures upon performing a set of transformation operations upon thesample dataset S130; and returning an analysis characterizing the set ofcrop-associated features based upon the set of microbiome-associatedfeatures S140.

In some variations, the method 100 can additionally or alternativelyinclude executing an action for producing a desired outcome in relationto the agriculture site, with respect to a specific soil type and aspecific crop, based upon the analysis S150.

The method 100 functions to generate predictions of crop-associatedfeatures at one or more agricultural sites, based upon a finite numberof input features derived from processing of samples at the agriculturesite(s). As such, the method 100 can generate insights for improvementof outcomes at the agriculture site(s), using novel processes, thatprovide improved efficiency and efficacy of achieving results at theagriculture site(s). Variations of the methods described can beimplemented for various crop types, soil types, agriculture sitelocations, and/or other factors.

Furthermore, in downstream applications, refinement of models, systemarchitecture, and sample processing techniques can be used to guidetesting of, recommendation of, and/or implementation of (e.g., usingautomated or manual systems/devices) agricultural inputs, products foruse, and management practices, in order to improve desired outcomes(e.g., in relation to yield, in relation to agriculture site health, inrelation to sustainability, etc.). As such, the method(s) can providesteps for monitoring, controlling, and analyzing agriculture activities,with practical applications in food production, viticulture, bio-fuelproduction, and other agricultural activities.

In one example use-case, the method provided steps for and implementedsystems that produced a greater percentage of potential yield (e.g.,where actual yield typically corresponds to about 10-75% of the yieldusing standard methods). As such, the methods implement new technologyfor recovering a significant portion of lost yield.

Beyond characterizing ecological communities in terms of communityaggregated traits (CATs), which result from the aggregation of taxacharacteristics, the methods described provide characterizations andguidance for interventions based upon emergent properties (EPs) arisingfrom specific taxa combinations and/or other features. In particular,the invention(s) produce models with architecture for contextualizationof emergent properties into ecological mechanisms, with functionalityfor returning predictions of how communities would behave under variouscircumstances. Additionally, translating the idiosyncratic communitybehaviors into a measurable metric enables microbiome monitoringapplications, such as in sustainable farming, food production, or humanhealth. The methods thus discover, identify, and implement newbiomarkers of agriculture site health and provide tools to provideaccessibility of such information and guidance to suitable entities.

The method(s) described can be implemented by systems and platformsdescribed in Section 3 below. Additionally or alternatively, themethod(s) described can be implemented by embodiments, variations, andexamples of systems described in U.S. application Ser. No. 15/779,531filed on 5 Dec. 2016, which is herein incorporated in its entirety bythis reference.

2.1 Methods—Sample Reception

Step S110 recites: receiving a set of agriculture samples from one ormore agriculture sites or in association with an agricultural process,which functions to provide source material for generation of data fromwhich models for characterizing statuses of the agriculture site(s)and/or various perturbations in downstream steps.

In step S110, samples can be received from various portions of theagriculture site(s) and/or states of processing of crops or otherproducts derived from the agriculture site. In embodiments, samples canbe extracted from soil, another substrate, water used in agriculture,from various portions of crops, from organisms interacting with crops(e.g., parasites, other symbiotic organisms, etc.), from consumableproducts (e.g., food, beverages, supplements, etc.) derived from crops,from other surfaces (e.g., conduits used to deliver water or nutrientsto crops, etc.), and/or from other suitable sampling sites. The samplescan include solid samples (e.g., soil, sediment, rock, food samples).The samples can additionally or alternatively include liquid samples(e.g., surface water, sub-surface water, other liquids derived fromcrops, consumable products derived from crops, crop-derived products atvarious stages of processing or fermentation, etc.). The samples canadditionally or alternatively include gas samples (e.g., samples fromgases obtained from a greenhouse, gases produced during processing ofcrops or crop-derived products, etc.). Samples can be taken from cropportions (e.g., reproductive portions, petals, leaves, fruits, roots,trunks, flowers, pollen, etc.) and/or from crops in various states ofhealth (e.g., healthy states, distressed states, diseased states, etc.).

Sample volumes can range from 0.01 grams to 1 kilogram (or greater than1 kilogram, less than 0.01 gram). Additionally or alternatively, samplevolumes can range from 1 microliter to 1 liter (or greater than 1 liter,less than 1 microliter).

Samples from different portions of the agriculture site, differentportions of a crop, different portions or stages of a product beingproduces, and/or different sources can be combined in Step S110.

In examples, whole plants were collected and processed to extractsamples from bulk soil (e.g., with vigorous shaking to acquire bulk soilfrom roots) and rhizospheres (e.g., by chopping roots separated frommother tubers), where, in more detail, a total of 185 samples (e.g.,from treated and untreated plots at multiple agriculture sites) werecollected over four time points at multiple locations associated withvarious metacommunities (e.g., in Michigan and Idaho), from planting(T0) to harvest (T3), focusing on the early changes occurring after one(T1) and two (T2) months from planting, where T0 and T3 were bulk soilsamples, and T1 and T2 were rhizosphere soil samples. However, inalternative variations samples can be taken from other suitable sourcesassociated with crops, agriculture sites, and/or product processingsites.

In relation to step S110, sample reception/collection can be performedusing equipment (e.g., machinery, robotic apparatus configured totraverse an agricultural site in coordination with retrieval of the setof agriculture samples, other apparatus) and/or manually. In variations,sample reception/collection performed in Step S110 can use any one ormore of: an instrument (e.g., scoop for soil, sharp instrument forextracting a portion of a crop specimen, etc.), a permeable substrate(e.g., a swab, a sponge, etc.), a non-permeable substrate (e.g., tape,etc.), a container (e.g., vial, tube, bag, etc.) configured to receive asample from the agriculture site or associated crops, and any othersuitable sample-reception element. In a specific example, samples can becollected from one or more of: soil, other crop-associated solids,water, other crop-associated liquids, gases, and a crop component (e.g.,root, stem, leaf, flower, seed, other plant component, etc.). Inrelation to soil samples, samples can be extracted in relation to areference point (e.g., distance from surface, distance from plant,etc.). In relation to plant components, samples can be taken from areference (e.g., distance from leaf, distance, from node, distance alongroot, etc.). In variations in which multiple samples are taken, samplescan be pooled (e.g., combined) or kept distinct.

In relation to step S110, samples can be acquired once, or at severaltime points within a time period or in relation to a process (e.g.,process associated with crop handling, fermentation process, process forpreparing crops for consumption, etc.). The time period can be on theorder of seconds, minutes, hours, days, months, years, decades, or ofany other suitable time scale. In one example, samples were taken atfour time points from planting to harvest. However, samples can be takenprior to planting and/or post-harvest.

Furthermore, samples can be received from one or more metacommunities,where a metacommunity is defined as a group of communities within thesame habitat/region/pool associated with each agriculture siteassociated with the set of samples, where the group(s) of communitiesdisplay multiple possible arrangements according to environmentalfilters, dispersal restrictions, priority effects and the latterestablished interactions. As such, features, insights, and actionsimplemented in subsequent steps of the method can be generated orperformed at the metacommunity level and/or at local levels ofabstraction.

In one example, step S110 involved reception/collection of soil samplesfrom vineyards from multiple geographic locations (e.g., U.S, Spain)over a certain time period (e.g., years). In the example, the sampleswere taken from topsoil, at a 30 cm distance from the vine trunk, withina depth between 5-10 cm.

In other examples, however, samples can be acquired in another suitablemanner or from other suitable sources.

In relation to step S110, some approximation of the whole method 100 canbe run with only latitude and longitude information in the absence ofsamples. With geostatistical models and Markov processes, the samplecharacteristics can be predicted without physical sampling, and themethod 100 can be subsequently run to provide an ecological picture ofthe unknown soil. All available information can be modeled while missinginformation can be imputed implicitly with Bayesian models.

2.2 Methods—Sample Processing

Step S120 recites: generating a sample dataset upon processing the setof samples with a set of sample processing operations, which functionsto process raw sample material with one or more operations, therebygenerating base data from which features can be extracted in subsequentportions of the method. In step S120, processing the set of samples caninclude wet lab processing techniques (e.g., sample lysis, sampleenrichment, sample purification, target material capture or separation,target amplification, etc.), as well as sequencing and librarypreparation operations. As such, generating sample data in step S120includes a combination of sample processing techniques (e.g., wetlaboratory techniques) and computational techniques (e.g., utilizingtools of bioinformatics) to quantitatively and/or qualitativelycharacterize the microbiome, functional features, and/or other aspects(e.g., chemistry) of each sample of the agricultural site(s). Sampleprocessing operations can include generation of one or more of: a fullmetagenomic dataset, a metatranscriptomics dataset, and a proteomicsdataset.

As such, in variations, step S120 can include one or more of: samplepre-processing (e.g., with homogenization or chopping, with use of abuffer, with formation of a pellet, etc.), sample storage (e.g., atappropriate conditions prior to subsequent processing, e.g., at −80 C,at 4 C, at another suitable temperature, etc.); sample lysis (e.g.,using physical methods, using chemical methods, using biologicalmethods, etc.); genetic material (e.g., nucleic acid material)extraction including extraction of DNA, RNA, nucleic acid fragments, orother nucleic acid material; protein extraction; nucleic acidpurification (e.g., using precipitation, using liquid-liquid basedpurification, using chromatography, using binding moiety functionalizedparticles, etc.); target material capture; removal of sample waste;target incubation; target amplification (e.g., using polymerase chainreaction (PCR)-based techniques, using helicase-dependent amplification(HDA), using loop mediated isothermal amplification (LAMP), usingself-sustained sequence replication (3SR), using nucleic acid sequencebased amplification (NASBA), using strand displacement amplification(SDA), using rolling circle amplification (RCA), ligase chain reaction(LCR), etc.); target enrichment; and/or any other suitable sampleprocessing steps.

In relation to amplification of nucleic acids, primers used can bedesigned to mitigate amplification bias effects, as well as configuredto amplify nucleic acid regions/sequences (e.g., of the 16S region, 18Sregion, the ITS region, etc.) that are informative taxonomically,phylogenetically, in relation to emergent properties, for formulations,and/or for any other suitable purpose. Primers used in variations ofBlock S120 can additionally or alternatively include incorporatedbarcode sequences, unique molecule identifiers, adaptor sequences, orother sequences specific to each sample and/or in association withsequencing platforms, which can facilitate identification of materialderived from individual samples post-amplification. Examples of customprimers are described in WO 2017/096385 published 8 Jun. 2017, which isherein incorporated in its entirety by this reference.

Furthermore, sequencing can be performed in coordination with a nextgeneration sequencing platform (e.g., Illumina™ sequencing platform) orother suitable sequencing platform (e.g., nanopore sequencing platform,PacBio platform, MinION platform, etc.). Additionally or alternatively,any other suitable sequencing platform or method can be used (e.g., aRoche 454 Life Sciences platform, a Life Technologies SOLiD platform,etc.). Additionally or alternatively, sample processing can implementany other step configured to facilitate processing (e.g., using aNextera kit) for performance of a fragmentation operation (e.g.,fragmentation and tagging with sequencing adaptors) in cooperation withamplification. Additionally or alternatively, filtering of sequences(e.g., chimeric sequences, other sequences, etc.) can be performed incoordination with step S120.

In examples of sample processing according to step S120, soil sampleswere stored at −80° C. in buffer until performance of a nucleic acidextraction operation (e.g., DNA extraction), where nucleic acidextraction was performed using a kit for extraction of organism DNA(e.g., DNeasy PowerLyzer PowerSoil Kit™, Qiagen™). Libraries were thenprepared following a two-step PCR protocol (e.g., associated with anIllumina™ platform and protocol), and sequenced on an Illumina MiSeq™platform using paired end sequencing (e.g., at 2×300 bp).Post-sequencing, a library preparation operation was performed, wherelibraries were generated upon amplification and sequencing of targetregions (e.g., 16S rRNA V4 region, the ITS1 region, etc.) using customprimers as described above, and raw sequences were analyzed usingVSEARCH using default parameters. Briefly, raw paired-end FASTQsequences (forward and reverse paired reads) were merged, filtered by anexpected error 0.25, dereplicated, and sorted by size. A filteringoperation was performed where chimera sequences were filtered out incoordination with clustering of non-singleton sequences into 97%identity operational taxonomic units (OTUs), or into amplicon sequencevariants using a single mismatch (99.7% identity). Taxonomic annotationwas performed (e.g, using a SINTAX algorithm, using algorithms thatimplement k-mer similarity metrics to identify top taxonomic candidatesfor annotation, using algorithms that identify full-length alignments toreference sequences, etc.). In one example, all combined sequences werethen mapped to a list of 31,516 OTUs with an identity threshold (e.g.,at least 90% identity, at least 95% identity, at least 97% identity, atleast 99% identity, etc.), resulting in an OTU table with 54,738,544sequences, averaging 156,395 sequences per soil sample. Samples had onlya fraction of OTU richness, averaging 529 OTUs (e.g., in relation to arange of 23-4999 OTUs) per soil sample. OTUs were then classified (e.g.,with a UNITE database according to a UTAX pipeline, with a SILVA 123database through a SILVA-NGS pipeline). However, variations of theexample can implement other sequencing protocols, OTU mapping, OUTclassification algorithms, and/or other methods.

In variations, the method can additionally or alternatively includeprocessing and assessment of amplicon sequence variants (ASVs). Inparticular, during sequencing, it is expected that some “identified”nucleotide sequences may be incorrect due to sequencing errors; thus,the reads are clustered together to compensate for this, groupingsimilar sequences to form clusters which are represented by the centroidsequence (i.e., the most abundant sequence of the cluster). Insituations where a 97% sequence identity threshold for OTUs is tooinclusive for some families of species, processes of the method caninclude clusterization between sequences with a difference of only onenucleotide, in order to maintain the highest possible granularity andkeep small differences visible, such that they can be annotatedseparately. As such, the method can include performing an ampliconsequence variant operation involving: clustering of the set ofidentified sequences into a set of clusters; and identifying, for eachof the set of clusters, a centroid sequence representing a most abundantsequence of a respective cluster of the set of clusters. Clustering theset of identified sequences can include grouping sequences of the set ofidentified sequences having a difference in nucleotides satisfying athreshold condition (e.g., a difference less than 1 nucleotide, adifference less than 2 nucleotides, a difference less than 3nucleotides, a difference less than 10 nucleotides, etc.).

Thus, in certain variations, ASV-associated approaches can significantlyincrease the number of final sequences to annotate for the same sample,increasing resolution and allowing better discrimination of closelyrelated species. This approach can also allow performing of annotationof ASVs against curated taxonomic databases based on exact sequencematches which allows assessing in silico performance metrics for theannotation of each ASV. In specific applications, 16S ASVs providedsuitable performance metrics (e.g., >90% sensitivity, >90%specificity, >90% positive predictive value, >90% negative predictivevalue) for identifying ˜46% of the species and ˜89% of the genera. ITSASVs also provided good performance metrics for identifying ˜87% ofspecies and ˜97% of the genera.

In variations, Bayes factors derived from the posterior odds of aconnection between OTUs or ASVs can be used as edge-weights for weighteddirected networks, and derivative features processed by modelsassociated with the methods.

In relation to sample acquisition and sequencing, sample data can betagged with contextual data, in order to couple identified samplefeatures with various conditions (e.g., perturbations, products,environmental conditions, etc.) in downstream steps of the method. Invariations, contextual data can include one or more of: geographiclocation (e.g., latitude, longitude, altitude); meteorological metadata(e.g., from Dark Sky API); climatic information (e.g., precipitationintensity, precipitation probability, maximum temperature, minimumtemperature, dew point, humidity, environmental pressure, wind speed,wind bearing, wind gust, cloud cover, UV index, etc.); environmentaldisaster information (e.g., fires, hurricanes, tornadoes, earthquakes,temperature variations, etc.); organic management practices (e.g.,integrating cultural, biological, and mechanical practices that fostercycling of resources, promote ecological balance, and conservebiodiversity without use of synthetic fertilizers, sewage, irradiation,and genetic engineering); non-organic management practices; use ofsynthetic fertilizers; use of natural fertilizers; biodynamic managementpractices (e.g., with generation of their own fertility throughcomposting, integrating animals, cover cropping, and crop rotation);conventional management practices (e.g., with standard farming systems,using a variety of synthetic chemical fertilizers, pesticides,herbicides and other continual inputs, etc.).

In variations, perturbations associated with the agriculture site(s)and/or crops from which samples are derived can include one or more of:a management practice (e.g., a conventional management practice, anorganic management practice, and a biodynamic management practice); aregenerative practice (e.g., application of one or more of a cover crop,silvopasture, managed grazing, intercropping, etc.); a biological inputincluding one or more of: a biostimulant, a biofertilizer, a biocontrolagent, a biopesticide, compost, and a biodynamic preparation (whereinthe biological input is applied by one or more of: a broadcast spray, anin-furrow spray, seed treatment, application to soil with incorporation,and application to soil without incorporation, etc.); a naturalecological disturbance; and another suitable perturbation.

Data can additionally or alternatively be tagged with metacommunitydescriptors, thereby tagging sequence data of the sample dataset with aset of metacommunity descriptors corresponding to a set of communitieswithin a same habitat associated with the agriculture site. Inparticular, a metacommunity is defined as a group of communities withinthe same habitat/region/pool associated with each agriculture siteassociated with the set of samples, where the group(s) of communitiesdisplay multiple possible arrangements according to environmentalfilters, dispersal restrictions, priority effects and the latterestablished interactions. As such, in subsequent steps of the method100, computing architecture for merging the metacommunity-inferredassociations into each of the local communities associated with the setof samples, enables returning of estimations of network properties inall the local communities within the metacommunity, individually,obtaining sample(site)-specific information on microbial ecosystemfunctioning. Such processes also enable direct comparison among networkproperties of individual samples, even in the absence of common taxaamong them, as all samples are mapped back to the metacommunity, therebyproviding a normalization step. Thus, these emergent properties can beimplemented as machine-determined universal biomarkers of ecologicaldisturbance.

In relation to model architecture associated with training andrefinement of machine learning models described further below, themethod described in relation to step S120 can be used to create trainingsets of data, in coordination with step S130 below. As such, trainingdata covering specific sample features and corresponding contextualinformation related to management practices and other perturbations(e.g., use of various products, environmental perturbations, otheragricultural inputs, other practices, etc.) can be used to refine modelsfor predicting effects of various practices and perturbations, and toguide future management practices in a sustainable manner.

In order to process such data, computing platforms implementing one ormore portions of the method can be implemented in one or more computingsystems, wherein the computing system(s) can be implemented at least inpart in the cloud and/or as a machine (e.g., computing machine, server,mobile computing device, etc.) configured to receive a computer-readablemedium storing computer-readable instructions. However, step S120 can beperformed using any other suitable system(s).

2.3 Methods—Data Transformation for Extraction of Features

Step S130 recites: generating a set of microbiome-associated featuresupon performing a set of transformation operations upon the sampledataset, which functions to extract features associated with networkproperties of sequences associated with organisms. For instance,transformations can be associated with functional inference based on thetaxonomic abundances and the known genetic composition of themicroorganisms identified, as well as ecological network factors withoperations for comparing functions and ecological network properties hasa higher degree of universality and thus allows more robust comparisons.

Step S130 can additionally or alternatively generate microbiomecomposition/community features (e.g., in relation to taxonomicalfeatures), as described above. As such, step S130 can be used togenerate features that can be processed by models generated and trainedas described below, in order to better understand the ecologicalprocesses and mechanisms behind community assembly. These processes areenvisaged to be a collection of inter- and intra-species interactions,which are represented by a network (formally, a graph in discretemathematics). In subsequent steps, structural properties of suchnetworks and their relationships can be contextualized as emergentproperties, which can be used to characterize statuses and responses toperturbations with respect to the agriculture site(s) being assessed.

In more detail, step S130 functions to generate features that go beyondCommunity Aggregated Traits (CATs) associated with constituent taxa ofthe set of samples, by generating features based upon emergentproperties that arise from specific community arrangements. Suchemergent properties are then processed to generate insights related tothe functionality of crop communities (e.g., seed survival rate),microbial communities (e.g., biofilm density, as a cause of compositionbehaviour), and/or other communities in subsequent steps.

In relation to step S130, the computing platform described can processoutputs of step S120 to generate a community dataset characterizingcommunities of organisms associated with the sample(s) acquired in stepS110. Generating the community dataset can include one or more of:rarefying samples to a desired sequencing depth in order to provide adesired level of detectability of OTUs of the sample(s); filtering OTUswith a desired threshold condition (e.g., retaining OTUs represented ina threshold number of samples); implementing a test for assessing thatlocal communities are represented adequately (e.g., using a Mantel testof Bray Curtis dissimilarities); transforming one or more data outputsderived from step S120 to include presence and absence factors withrespect to co-inclusion and/or co-exclusion of individual species (orother taxonomic units); resealing of counts such that compositional datawhich is bounded in the [0, 1] interval, i.e. a relative abundance, canbe represented on the full number line, ranging from negative infinityto positive infinity; retrieving significant co-inclusion andco-exclusion properties (e.g., for samples associated with individualsites, independently of each other), in order to provide datarepresenting potential for interactions in complete metacommunity and/orenvironmental distributions (e.g., thereby generating a first groupingof positive pairs of organisms and a second grouping of negative pairsof organisms); and performing other suitable data transformation steps.

Then, to generate a network property dataset, the computing platformdescribed can process the community dataset with architecture forimplementing one or more processes including: transforming the firstgrouping of positive pairs of organisms and second grouping of negativepairs of organisms (related to co-inclusion and co-exclusion,respectively) into one or more aggregate matrices representing thepossibility of co-inclusion (e.g., the whole number of potentialassociations between all the taxa in the pool, associations that aredescribed as system relevant interdependencies including: bioticinteractions, environmental affinities, dispersal restrictions, etc.)and co-exclusion of species (or other taxonomic units) in themetacommunity(ies) associated with the set of samples; subdividing theone or more aggregate matrices into a set of individual matricescontaining features associated with only the species (or other taxonomicunits) occurring in each of the set of samples; performing co-inclusionsand/or co-exclusion estimations in another suitable manner (e.g., basedupon covariance determination methods, based upon correlationdetermination methods, with SparCC, with SPIECeasi, etc.); processingthe set of individual matrices in order to generate a set of undirectednetwork mappings with nodes representing species (or other taxonomicunits) and edges representing statistically significantco-inclusions/co-exclusions; and performing other suitable dataprocessing steps. Then, in relation to step S130, the computing platformcan implement architecture for extracting features from the set ofundirected network mappings, where features can include one or more of:a number of connected components (i.e., defined in relation to asubnetwork in which any two nodes connect to each other by edges, thatlack connection to other nodes in the full network); a modularity factor(e.g., a quality of a partition into modules such as groups of nodesusing a quantity of edges inside modules compared to a quantity of edgesbetween modules, using an appropriate clustering algorithm (e.g.walktrap, Louvain, fast greedy, edge-betweenness, etc.); a clusteringcoefficient (e.g., based upon a transitivity determination and definedas a the ratio of triangles to connected triples in a respectivenetwork); an average path length between network components (i.e.,defined as a mean of the minimal number of required edges to connect anytwo nodes); an assortativity factor (e.g., a feature which measureshomophyly of a network, according to node properties or labels such asnode degree, which quantifies the number of edges associated to a node);a proportion of co-inclusion factor normalized to a total number ofcombinations of all OTUs in the sample(s); a proportion of co-exclusionfactor normalized to a total number of combinations of all OTUs in thesample(s); and other suitable features. In variations, networks can bevisualized or rendered by the computing platform, in order to generatedepictions of network topology in multidimensional space (e.g., inrelation to generation of reports or execution of actions described infurther detail below).

As described above, step S130 can also generate features associated withcompositional aspects and/or functional aspects of the sample(s) fromthe agriculture site(s). For instance, compositional and functionalaspects can include compositional aspects at the microorganism level,including parameters related to distribution of microorganisms acrossdifferent groups of kingdoms, phyla, classes, orders, families, genera,species, subspecies, strains, infraspecies taxon (e.g., as measured intotal abundance of each group, relative abundance of each group, totalnumber of groups represented, etc.), and/or any other suitable taxa.Compositional and functional aspects can also be represented in terms ofoperational taxonomic units (OTUs), amplicon sequence variants (ASVs) orother units. Compositional and functional aspects can additionally oralternatively include compositional aspects at the genetic level (e.g.,regions determined by multi-locus sequence typing, 16S sequences, ITSsequences, other genetic markers, other phylogenetic markers, etc.).Compositional and functional aspects can include the presence or absenceor the quantity of genes associated with specific functions (e.g.,enzyme activities associated to nutrient metabolism, disease resistance,biocontrol microorganisms and metabolites, stress sensing and resistancemicroorganisms and metabolites, phytohormone production, etc.).

In a specific example related to abundance-associated features, stepS130 included implementation of methods for determining differentialabundance of various OTUs. In more detail, zero counts in data werereplaced, where valid values for replacement were calculated under aBayesian paradigm, assuming a Dirichlet prior. Non-zero values were thenadjusted to maintain the overall composition using a pairwise comparisonprocess for differential expression analysis (e.g., edgeR algorithm).For each OTU, the fold change attributable to a treatment (e.g.,biostimulant) across different times (e.g. T0 to T1) was calculated.This was done by conducting a hypothesis test separately for eachlocation, measuring the fold change of a given OTU in the treatmentgroup (from T0 to T1) vs. the fold change in the control group (from T0to T1), and then repeating the test but using times T0 and T2.

In variations, network properties can be determined for different typesof organism communities (e.g., bacterial communities, fungalcommunities, etc.) independently of each other or in an aggregatedmanner.

Network properties can further include local network features extractedfrom a metacommunity network, and network properties from co-exclusionnetworks and network properties from co-inclusion networks.

Network property determination can further include estimating networkproperties from a given RNA-seq sequencing sample by way of: estimatinga meta-network from the entire sample dataset, estimate the localnetwork as an induced subgraph at the sample level of the biggermeta-network graph, and calculating its network properties, where eachnetwork is considered a collection of pairwise connections. Features canbe derived from combinations of p-hypergeometric (PH) network propertiesand Bayesian factor (BF) network properties (and/or Bayesian fishernetwork properties, Monte Carlo Markov Chain network properties, otherproperties) and can include one or more of: transitivity properties,modularity properties, assortativity properties, total number of speciesrepresented, and other suitable properties.

In examples, p-hypergeometric (PH) network operations performed canimplement assumptions that the co-occurence of two events that are notdependent follows a hypergeometric distribution. In examples, anoccurrence is defined as a count above 0. Hence, for a given pair OTUcounts, a co-occurrence is when both have a count >0 in the same sample.The probability of observing the pattern of co-occurrences across thewhole dataset is tested against the theoretical hypergeometricdistribution (hence the name p-hypergeometric, or PH network). Pairs ofOTU occurrences that happen significantly more times than would beexpected by chance are labelled as a co-inclusion connection.Conversely, pairs of occurrences that happen significantly fewer timesthan would be expected by chance are labelled as having a co-exclusionconnection.

In examples, Bayesian factor (BF) network operations performed canimplement assumptions that do not consider o to be a threshold value.Instead counts themselves are classified. For example, in a given samplea first OTU of a pair has 20 reads and a second OTU of a pair has 50reads. The “common count” or CC is 20 reads (by definition, always theminimum of the two). The absolute difference or AD between them is 30.In BF operations, the “balance” between these two counts for every pairof OTUs possible (In practice, we measure the odds of seeing the currentdistribution of CC and AD) is measured. To generate the networkoperations include generating a Bayesian factor for each sample for theodds of co-occurrence of any given pair. Since the BF is a continuousnumber, filtering can be performed at any point. In an example,filtering can be performed at BFmedian>1.2 for co-inclusion andBFmedian<0.8 for co-exclusion.

Furthermore, additional interactions characterized between organisms caninclude commensalism factors, facilitation factors, mutualism factors,antagonism factors, competition factors, neutralism factors, andamensalism factors, which are derived from combinations of positive,negative, and neutral interactions.

2.3.1 Statistical Analyses

In variations, features can additionally or alternatively be processedwith one or more statistical or other mathematical processes, in orderto generate derivative features derived from outputs of steps S120and/or S130. For instance, processing of features can include one ormore of: implementation of principal component analysis (PCA) methods;generating measurements of variance; implementation of correlative tests(e.g., Spearman correlations); implementation of variance tests (e.g.,Kruskal-Wallis tests); implementation of multidimensional scalingprocesses (e.g., a non-metric multidimensional scaling (nMDS)algorithm); performing probabilistic methods; implementation ofstatistical models (e.g., generalized linear models, etc.); andperforming other suitable statistical tests.

In examples, the method can include steps for performing alpha-, beta-,and/or gamma-diversity analyses in relation to various taxonomic groupsand/or associated features. For instance, variations of the methodinclude steps for performing alpha- and beta diversity analyses using16S and ITS ASV or OTU counts (e.g., using R vegan), wherealpha-diversity metrics (e.g., Shannon, richness, etc.) were calculatedand plotted across all covariates available. The algorithm alsoimplemented architecture for performing Wilcoxon rank-sum tests tocompare samples associated with different treatments (e.g., control andtreatment groups) within location-timepoint subgroups. Forbeta-diversity, the algorithm implemented Kruskal's non-metricmultidimensional scaling in conjunction with Aitchison distances. Thealgorithm also implemented architecture for determining relativeabundances for OTUs as well as generating annotations at varioustaxonomic levels (e.g., genera, families, etc.), for use in subsequentanalyses. The algorithm also implemented architecture for performingpermutational multivariate analysis of variance on the Aitchisondistance matrix, using all possible combinations of the location,timepoint and treatment variables.

However, other variations of methods for characterizing diversitymetrics and/or other statistical methods can be implemented.

2.3.2 First Example of Generation of Network Properties

In a first example of generation of network properties informative ofagriculture site characteristics and ultimately crop-associatedfeatures, step S130 included building a metacommunity network of allsamples with architecture for: estimating the co-occurrence andco-exclusion that would occur solely by chance for all possible OTUpairs. The algorithm also implemented architecture for selecting OTUpairs that occurred significantly more than expected by chance to createthe co-occurrence networks. Similarly, those that occurred significantlyfewer times than expected by chance were used to build the co-exclusionnetwork. Local networks (e.g., single sample-level) were calculated bysubsetting the metacommunity network for OTU pairs detected in eachsample and estimating a local network. The algorithm also implementedarchitecture for calculating network properties, according to methodsdescribed, where network properties were compared using a linear model.Using the network property as outcome, hypothesis tests were performedto compare timepoint differences in various sample groups (e.g., treatedvs. control group), in a manner analogous to the approach used forinvestigating differential abundances described above.

Other examples of generation and use of network properties are describedin U.S. application Ser. No. 17/119,972 filed 11 Dec. 2020, which isherein incorporated in its entirety by this reference.

2.4 Methods—Model Development and Use

Step S140 recites: returning an analysis characterizing the set ofcrop-associated features based upon the set of microbiome-associatedfeatures. Step S140 functions to implement feature data as inputs, andto generate outputs corresponding to crop-associated features (e.g.,estimations of predicted yield, etc.). Step S140 can additionally oralternatively be used to predict or characterize agriculture sitestatuses and/or responses to various perturbations with respect todownstream applications.

In relation to estimations or other predictions returned using stepS140, computing platform subsystems described can implement architecturefor processing features, generating network data, providing insights,with training of models by processing suitable training datasets. Inparticular, the emergent properties and/or other features described,given the ultra-high dimensionality of microbiome data, are notpractically detectable by the human mind, and are instead trained andprocessed by the machine learning architecture in relation to associatedsteps of the method 100.

In one aspect, as shown in FIG. 2, a generated model can includearchitecture for processing features (e.g., estimates of alpha-, beta-,and/or gamma-diversity, co-inclusion features, co-exclusion features,other features described above) as inputs, and returning derivativeoutputs describing crop-associated features, where processing of inputfeatures can occur at multiple layers of the model(s) in relation tomodel architecture.

In variations, crop-associated features can include one or more of:yield characteristics, crop health and disease states, crop agecharacteristics (e.g., lifespan, cycles of productivity, vegetativegrowth state, etc.), crop shelf life, and/or other crop-associatedcharacteristics. As such, embodiments of the method can include stepsfor predicting crop yield based solely upon microbiome characteristicsof samples acquired from the agricultural site associated with thecrops.

Additionally or alternatively, in variations, outputs of step S140 caninclude one or more of: nutritional composition features (e.g., of soil)at the agriculture site, soil carbon sequestration characteristics, andother features associated with the agriculture site.

As shown in FIG. 2, generation of crop-associated and/or other featurescan include: processing first features with a dimensionality reductionoperation S141 (e.g., PCA) as described above. Then, in order toevaluate the degree in which local network properties deviate from anull model expectation (e.g., based upon an expected unperturbed statesampled from similar sites), the computing platform can includearchitecture for processing matrices containing only the speciesoccurring in each of the individual samples, which were randomizedacross the metacommunity co-inclusion/co-exclusion matrices.

The computing platform can then, as shown in FIG. 2, includearchitecture for calculating a measure of variance between an observedoutput parameter and an expected output parameter S142, followed byassessment if a factor (e.g., agricultural input, weather, management,applied product, etc.) had an effect on network properties (e.g.,through Spearman correlations, through Kruskal-Wallis tests, regressionmodels, machine learning algorithms, etc.) as described above.

To estimate the relative contribution of an input factor and/or othercharacteristic related to the crop-associated factor, networkproperties, and/or other properties, the computing platform can include,as shown in FIG. 2, architecture for performing a partitioning analysisS143 using the non-metric multidimensional scaling (nMDS) two-dimensionscores as the response variables.

In variations, the computing platform can include architecture forcalculating predicted crop-associated features (e.g., yield) and/orother crop or agriculture site characteristics, by fitting variablesderived from features described above (e.g., in relation to atransitivity feature, in relation to a modularity feature, in relationto an average path length, feature, in relation to a co-exclusionproportion parameter, etc.) into a generalized linear model (GLM) with asuitable distribution (e.g., a binomial distribution) as well asnon-parametric (e.g. kernel regression, k-nearest neighbours, etc.) andmachine learning (e.g. random forest, LASSO-Ridge regression, etc.)models.

As such, architecture of the systems described in relation to step S140can process input features and return outputs that are indicative ofstatuses and responses to various perturbations, which can be used indownstream portions of the method 100 in order to improve or maintaincharacteristics of the agriculture site(s) being analysed in a desiredand/or sustainable manner.

With further training, advanced models can further be configured togenerate crop-associated feature predictions based on microbiome-deriveddata, without knowledge of other information, in order to characterizeaspects of the agriculture sites and/or crops.

2.4.1 Models and Machine Learning Approaches

To refine the model(s), the method 100 can include generating one ormore training sets of data, from samples of the agriculture site(s)and/or other samples of other agriculture site(s), in order to train theartificial intelligence (AI)/neural network (NN) model(s) in one or morestages of training, to identify features of interest from variousinputs. In variations, generating training sets of data can includeprocessing raw data and/or features taken from agriculture sites and/orcrops with known characteristics (e.g., in relation to contextual and/orother data described above, in relation to agricultural inputs/practicesapplied in substantially controlled settings, etc.). Such training datacan be tagged with associated crop-associated features, agriculture sitestatuses (e.g., health statuses) and/or other information (e.g.,pertaining to nature of inputs/practices, etc.).

In examples, training data can include tagged contextual information,which can include environmental information, geolocation information,nature of products applied (e.g., dosing, duration of application,frequency of application), pathogens present at a site, and/or othersuitable information.

Training sets of data can include raw sequencing data, transformedsequencing data (e.g., according to transformation operations describedabove), and/or other suitable data. As such, as shown in FIG. 3, themethod 100 can include: generating one or more training datasets S145from a set of agriculture sites and/or crops (e.g., sites different fromthose in step S110, sites overlapping with those in step S110), thetraining datasets corresponding to features (e.g., of emergentproperties, of community properties, of taxonomic properties, offunctional properties, etc.) in association with statuses and/or inputsor practices experienced by the agriculture site(s) and associatedcrops; applying one or more of a set of transformation operations to theone or more training datasets (e.g., using one or more operationsdescribed above) S146; and training a machine learning model comprisingarchitecture for returning at least one of the set of unique signaturesand the analysis, in one or more stages, based upon the one or moretraining datasets S147. Additional details are provided below.

For instance, in relation to generation of training datasets, the methodcan include generating network properties/emergent properties and otherfeatures upon samples from agriculture sites (or other sites) wherestatuses and/or perturbations are known. Additionally or alternatively,first training datasets can be generated from networkproperties/emergent properties and other features upon processingsamples from agriculture sites (or other sites) known to be at baselinestate. The model can be trained based upon the first training datasets.Then, the site(s) and/or associated crops can be intentionally perturbedin some manner, with subsequent sample acquisition and processing usedto generate second training datasets for refining the model. Thisprocess can be repeated any suitable number of times. As such, trainingdata can be developed in multiple stages. In relation to multiple stagesof training, the method 100 can refine models based upon incorrectclassification of outputs (e.g., mis-characterized statuses and/orperturbations).

Furthermore, combinatorial features (e.g., combination features derivedfrom one or more individual network properties, one or more communityproperties, one or more taxonomic properties, and/or other suitableproperties) can be used for training. In more detail, features may betransformed either individually or in combination before being processedby the model(s). As an example of an individual feature transformation,a feature derived from a transform of a co-exclusion feature might beused instead of or in addition to the co-exclusion feature itself. As anexample, a combinatorial feature can be derived from synchronousco-exclusion of a pair of organisms and co-inclusion of a pair oforganisms (e.g., where occurrence together is a feature). Additionallyor alternatively, combinatorial features based upon bacteria-associatedparameters and fungal-associated parameters can be used as inputs (e.g.,as a unified “impact” parameter or feature). For instance, an impactparameter can be derived from the scaled dissimilarity (distance)between the network properties (e.g., 16S network properties, ITSnetwork properties), as described in U.S. application Ser. No.17/119,972 filed 11 Dec. 2020, incorporated by reference above.

Additionally or alternatively, dynamic aspects (e.g., changes over timein features, changes in frequency between instances of respectivefeatures, other temporal aspects, other frequency-related aspects, etc.)of features derived from the samples can be used to predict or otherwiseanticipate statuses. As such, models can be implemented to preventadverse statuses of the agriculture sites to prevent root causes offailure and/or break chains of events that could lead to a cascade ofagriculture site problems.

Models can be developed and trained for real-time analyses and/orhistorical analyses. In relation to real-time analyses, the models canbe refined for rapid classification (e.g., with node reduction, withreduced thresholds, with lower confidence, etc.). In relation tohistorical analyses, the models can be refined for detailedclassification (e.g., without node reduction, with higher thresholds forclassification predictions, with higher confidence, etc.).

In embodiments, the method 100 can thus include training a modelconfigured to process input features and return predictedcharacterizations of crop-associated features and/or other features ofthe agriculture site, wherein training the model comprises: collecting atraining dataset derived from training samples, the training datasetcorresponding to training samples subject to at least one of themanagement practice and the perturbation to the agriculture site as wellas control samples without undergoing the input factor; applying one ormore of a set of transformation operations to the training dataset; andtraining the model with the training dataset, the model comprisingarchitecture for returning the analysis, in one or more stages.

While embodiments, variations, and examples of models (e.g., in relationto inputs, outputs, and training) are described above, models associatedwith the method 300 can additionally or alternatively include otherblocks for statistical analysis of data and/or machine learningarchitecture.

Statistical analyses and/or machine learning algorithm(s) can becharacterized by a learning style including any one or more of:supervised learning (e.g., using back propagation neural networks),unsupervised learning (e.g., K-means clustering), semi-supervisedlearning, reinforcement learning (e.g., using a Q-learning algorithm,using temporal difference learning, etc.), and any other suitablelearning style.

Furthermore, any algorithm(s) can implement any one or more of: aregression algorithm, an instance-based method (e.g., k-nearestneighbor, learning vector quantization, self-organizing map, etc.), aregularization method, a decision tree learning method (e.g.,classification and regression tree, chi-squared approach, random forestapproach, multivariate adaptive approach, gradient boosting machineapproach, etc.), a Bayesian method (e.g., naive Bayes, Bayesian beliefnetwork, etc.), a kernel method (e.g., a support vector machine, alinear discriminant analysis, etc.), a clustering method (e.g., k-meansclustering), an associated rule learning algorithm (e.g., an Apriorialgorithm), an artificial neural network model (e.g., a back-propagationmethod, a Hopfield network method, a learning vector quantizationmethod, etc.), a deep learning algorithm (e.g., a Boltzmann machine, aconvolution network method, a stacked auto-encoder method, etc.), adimensionality reduction method (e.g., principal component analysis,partial least squares regression, etc.), an ensemble method (e.g.,boosting, boot strapped aggregation, gradient boosting machine approach,etc.), and any suitable form of algorithm.

2.4.2 Geographical and Phenological Stage Dynamics

In some embodiments, the methods described can thus include:collecting/receiving samples across a set of time points, and returningcharacterizations of evolving population dynamics within fungal and/orother organism populations at the agriculture site, based upon alphadiversity and beta diversity patterns. Additionally or alternatively,collecting/receiving samples across a set of time points can be used toevaluate effects of actions executed at the agriculture site(s), forinstance, in relation to actions performed in association with repeatedinstances of Step S140 below. As shown in FIGS. 4A-4D, embodiments of aportion of the model returned a clear population dynamic occurring fromT0 (before planting) to T1 and T2 samples (i.e., one and two monthsafter planting, respectively) in all locations. FIG. 4A shows that interms of beta-diversity of bacterial populations, both the location(R²=0.24) and the phenological stage (R²=0.21) had significant effects,with the treatment (R²=0.01) having a minor non-significant effect.However, for fungal populations (FIG. 4C), location dominates as themain driver of the beta-diversity patterns (R²=0.36), with thephenological state having a much lower impact (R²=0.08) than inbacterial populations, and the treatment (R²=0.01) showed a minor butsignificant effect. As shown in FIGS. 4A and 4C, a first location (e.g.,White Pigeon) is significantly different from second and third locations(e.g., Grant and Sutton); this can be easily explained by thegeographical distance between locations, which correlates well with theAitchison distances of samples in the PCoA analysis. Contextual dataprocessed also captured different edaphological and weather conditionsat each of these locations, as input factors affecting diversitypatterns. The significant differences between microbial communitycompositions before and after planting can be clearly seen at FIGS. 4Aand 4C, where, despite the large differences between locations, T1 andT2 samples clustered in all the three locations, away from theirrespective T0, especially in the case of bacterial populations.

Regarding alpha-diversity FIGS. 4B and 4D, outputs of the model wereused for characterizing changes in richness and diversity at theagriculture site(s). In more detail, model outputs demonstrated theimpact of planting in reducing the diversity of bacterial and fungalpopulations, as shown for both OTU/ASV richness and Shannon (H′) indexvalues from T0 to T1 and until time T2 in most cases, indicating thatthe phenological stage of the plant is the main driver of changes at thealpha-diversity level in both bacterial and fungal populations.Comparing control versus treated samples at the same time point, weobserved significant changes (e.g., in Grant-sourced samples) at T1 forbacterial richness and Shannon index as well as fungal Shannon index.The model also returned outputs that demonstrated that Grant was thesite with the best yield increase response due to treatment. When soilsamples were again analyzed after harvest (T3) in Grant and Suttonlocations, there was no significant changes in alpha-diversity betweenthe microbial communities found in the soil before planting (T0) andafter harvesting (T3); therefore, the plant's associated soil microbiotaseems to have cycled back to its original state.

At the taxonomy level, the model returned outputs demonstrating clearpopulation dynamic patterns from T0 to T2 sampling times in all thethree locations and in both treated and untreated samples, as well asabundant genera for both bacterial and fungal communities. FIG. 5 showsthe top bacterial genera identified across samples in this study (coremicrobial species). Among the top fungal genera shared across samples inour study (core fungal species) we found Cryptococcus, Mortierella, andAlternaria. Outputs of the model also demonstrated cleartemporal—cyclical—dynamics which differentiates bulk soil (T0 and T3)and rhizosphere soil (T1 and T2) samples (FIGS. 4A-4D).

2.4.3 Crop Feature—Yield in Response to a Bioinoculant Example 1

Sets of samples can be collected contemporaneously with application of abioinoculant at the agriculture site, and returning the analysis caninclude returning a yield characterization of crops at the agriculturesite in response to application of the bioinoculant. In one exampleapplication of methods and models described, yield was characterized forpotato crops, in response to treatment by a microbial bioinoculant (B.amyloliquefaciens strain QST713), with evaluation of rhizospheremicrobiota and bulk soil microbiota associated with acquired samplesfrom multiple locations. In this example, treatment comprised anin-furrow application of the biological biostimulant during planting oftubers. The biostimulant contained a minimum of 2.7×10¹⁰ CFU/g of B.amyloliquefaciens QST713, and it was applied at a dose of 0.935 l/ha.However, variations of treatment can include other methods ofapplication and/or other dosing.

In particular, yield data was first explored using medians andinterquartile ranges (IQRs) of data, with rank sum tests (e.g., Wilcoxonrank sum tests) performed. The OTU counts (as described in variations ofmethods above) were transformed by model architecture using the centeredlog-ratio (CLR) transformation. CLR-transformed 16S and ITS data werejointly projected onto 70 principal components. The method was furtherstructured to fit a Random Forest model with architecture fordetermining if a rhizosphere or bulk soil sample was sourced from ablock with a yield ≤30 t/ha or >30 t/ha, based on its microbiomecomposition and structure using multivariate compositional data(Principal Components from a beta-diversity ordination) and localnetwork properties. Factors derived from geographical and phenologicalstages (e.g., examples of which are discussed in Section 2.4.2 above)were also analyzed.

In more detail, the method was configured to measure yield data in 20plots treated by the bioinoculant and 20 untreated plots (from multiplegeographical locations), and for each we utilized all samples availableover times T0, T1 and T2. In total 112 samples were processed by theexample method and split into a training set of 84 samples and a testset of 28 samples. The result of this model showed a predictive accuracyof 78.6% (FIG. 6A) and identified four variables/factors (e.g., twonetwork properties and two compositional properties) and associatedvalues as a set of important predictors of yield (FIG. 6B), even with ahigher importance than location (variable importance was based on theGini index). The method was able to identify previously-unknown features(e.g., the structure of fungal communities, fungal co-occurrencetransitivity and co-exclusion proportion, etc.) that demonstrated a muchhigher predictive value than the structure of bacterial communities(FIG. 6B).

The example model also returned outputs characterizing an inversecorrelation between the co-occurrence transitivity of bulk andrhizosphere soil fungal communities and the yield found in the potatocultivars. Such an output provides insights into understanding theeffects of agricultural inputs (e.g., of a B. amyloliquefaciens-basedbiostimulant, of other inputs, etc.) in shaping the structure of fungalcommunities as a potential mechanism of action when increasing theyield. Other variations of insights with actionable outcomes in relationto improving yield and/or agricultural site statuses are furtherdescribed. As shown in FIGS. 7A-7C, in going from T0 to T1 the increasein fungal co-occurrence transitivity in one geographical location (e.g.,Grant) is greater in the control samples than the treated ones, and inanother location (e.g., Sutton) the treatment was also found to increaseyield significantly. In a third location (e.g., White Pigeon) the modelreturned outputs indicating a decrease in fungal co-occurrence networktransitivity in going from T0 to T1 in treated samples, and even a moremarked decrease in control samples, corresponding to a lower yieldstate. Two compositional variables (PC3 and PC1) contributing to thepredictive power of the model were also explored by processing taxonomyof the OTUs (e.g., fungal biocontrol agent Trichoderma sp.) and/or inrelation to interaction patterns among various OTUs, and returning yieldpredictions.

In relation to the example model, yield was treated as a constant forall samples within a location and treated as a bimodal/multimodalcategorical variable in relation to yield per area (e.g., bimodal as ≤30t/ha, >30 t/ha; multimodal as ≤26 t/ha, >26 t/ha to ≤35 t/ha, >35 t/ha;multimodal as ≤20 t/ha, >20 t/ha to ≤26 t/ha; >26 t/ha to ≤35 t/ha , >35t/ha; etc.). The example model returned outputs indicating that fungalco-occurrence transitivity and fungal co-exclusion proportion had highpredictive power in relation to yield, independent of the number ofyield categories used. Bacterial co-exclusion proportion, fungalco-inclusion modularity, and PC12 also demonstrated high predictivepower. As such, the example model was constructed to process microbiomecomposition and structure data, with training of a Random Forest modelto estimate if a bulk or rhizosphere soil sample came from a low or highyield block with relatively high accuracy. The example model alsoreturned outputs indicating that the structure of fungal communities isa better estimator of potato yield than the structure of bacterialcommunities but that bacterial community factors and other factors hadpredictive power in relation to crop yield predictions, with respect totreatment and control groups.

FIG. 9 depicts outputs demonstrating that use of the bioinoculant had asignificant effect on increasing the crop yield (Grant p-value8.66×10⁻¹⁰, and Sutton p-value 7.67×10⁻⁷) in two of the three locationsassayed, thereby indicating effectiveness of the bioinoculant inaffecting yield at some of the agricultural sites.

Furthermore, outputs of the methods further demonstrated that use of thebioinoculant modulated microbiome composition and structuresignificantly, as related to yield characteristics of crops harvestedfrom the agricultural sites. In more detail, the method included stepsfor determining the fold change of each OTU in the bioinoculanttreatment group from T0 to T1 (and from T0 to T2) vs. the fold change inthe control group at the same time intervals per location. Out of 17,241unique bacterial OTUs in the samples of the study, 16 changedsignificantly from T0 to T1 (i.e., one in Sutton, and 15 in WhitePigeon), and 100 from T0 to T2 (i.e., 16 in Grant, 79 in Sutton, andfive in White Pigeon). These OTUs belong to 73 genera, of which, 13changed significantly in at least two locations: Bacillus,Bradyrhizobium, Clostridium, Novosphingobium, Rhodoplanes, Sphingomonas,Sphingopyxis, and Woodsholea in Grant and Sutton; Agromyces,Flavobacterium, Pedobacter, and Sporosarcina in Sutton and White Pigeon;and Stenotrophomonas in Grant and White Pigeon. For fungi, out of 1,702unique OTUs, ten OTUs changed significantly from T0 to T1 (i.e., eightin Sutton and two in White Pigeon), and 32 from T0 to T2 (i.e., 32 inSutton). These OTUs belong to 30 genera, of which, one changedsignificantly in at least two locations: Cryptococcus in Sutton andWhite Pigeon. Thus, despite the location and phenological stage having alarger effect than treatment in the composition of microorganismpopulations, the inoculant still generated common detectable abundancechanges in at least two of the three locations for several taxonomicgroups, some of which have known functionally relevant roles (e.g.,Bacillus, Bradyrhizobium, Flavobacterium, Pedobacter, Sphingomonas, andStenotrophomonas) affecting yield and other crop/agricultural sitefeatures.

The method also included steps for characterizing the co-occurrence andco-exclusion patterns between pairs of OTUs in each sample, as well assteps for estimating ecological emergent properties (i.e. nichespecialization, level of competition) which contribute to microbiomefunctioning. The method thus included architecture for buildingmetacommunities based on all samples and, as an initial filter forbacteria, the method retained OTUs that were detected in at least 30% ofthe entire dataset, and 90% for fungal communities (e.g., due to thedisproportionate number of unique OTUs detected in 16S vs. ITSsequencing, as well as larger variances seen in ITS sequencing). To keepthe overall size of the data manageable the method limited the number ofselected OTUs to 4,000 with a maximum of 10 million possible significantpairs, and filtered out OTU pairs that were not significantly (p<0.05)enriched (co-occurrence) or depleted (co-exclusion). This resulted inmetacommunity networks comprising 3,339 nodes for bacteria (19.4% of thetotal 17,241 bacterial OTUs) and 447 nodes for fungi (26.3% of the total1,702 fungal OTUs), which on average captured 92.11% of the bacterialabundance and 98.62% of the fungal abundance of the samples. The methodthen implemented model architecture for exploring the structure of localmicrobiome communities, based on just the nodes and edges present ineach individual sample, aiming to detect changes in network propertiesthat are associated with the application of the bioinoculant at aspecific location over time. Specifically, for the co-exclusion andco-occurrence bacterial networks, the method included steps forcalculating the modularity (i.e., a measure of the strength ofpartitioning of a network into modules) and transitivity (i.e., measureof the degree to which nodes in a network cluster together) as well asthe proportion of co-exclusions and co-occurrences present in the localnetwork compared to the total number of possible combinations among allOTUs in the sample.

FIGS. 7A and 7B depict outputs of the method capturing the evolutionfrom T0 through T2 of four of the six local network properties studiedacross locations, for bacterial and fungal populations, respectively.FIG. 7C lists those changes that have been significant (in time—from T0to T1, and from T0 to T2—in treated vs. untreated blocks. In Grant therewas a significant decrease in fungal co-occurrence transitivity andbacterial co-occurrence proportion from T0 to T1 in the treated sampleswhen compared to untreated ones. The model thus captured humanintervention in a crop and its effects on altering the structure ofmicrobial communities of the soil, and returned outputs indicatingdecreased transitivity on the fungal co-occurrence network as a commonindicator of these types of alterations. Furthermore, low clusteredcommunities (i.e., those with low transitivity scores) were demonstratedto be associated with highly competitive environments with a high degreeof niche specialization, that are among the most relevant properties ofan ecosystem when trying to understand its functionality and itsresponse to human interventions and land-use changes. The model alsoreturned outputs indicating a lagged effect (at T2) of the treatment inmodifying some network properties of the bacterial communities in bothGrant and Sutton. In Grant, the bacterial co-occurrence proportionincreases from T0 to T2 (in contrast to the decrease from T0 to T1), andat the same time the transitivity of the bacterial co-occurrence networkincreases. In Sutton, both the bacterial co-occurrence proportion aswell as the bacterial co-exclusion proportion increased from T0 to T2.Thus, when attending to the microbiome structure changes caused by thetreatment in Grant and Sutton, which were the locations where treatmenthad a significant effect over yield, the method highlighted significanttreatment-mediated effects over the fungal and bacterial communitynetworks that decreased from T0 to T1, and then increased in T2.Interestingly, and contrary to what was observed in Grant and Sutton, inWhite Pigeon, the location where treatment didn't have a significanteffect over yield, there was an increase in the bacterial co-exclusionmodularity from T0 to T1.

In a variation of the example methods described, the method was adaptedto further process geophysical metadata/physicochemical data to improveaccuracy of predictions of yield (e.g., from ˜79% to ˜86%). In moredetail, the model processed the geolocation of each sample and extracted(e.g., from one or more databases) soil physicochemical data associatedwith location. In the example, geophysical measurements closest to thesampling point for each sample were averaged; however, in othervariations, geophysical metadata/physiochemical data can be acquired atthe sampling site (e.g., through direct sampling, through extraction ofinformation from databases, etc.). Processing of geophysicalmetadata/physicochemical data in addition to microbiome and network dataimproved predictions of yield significantly, where relative importancefactors (based on the Gini index) from variations of the models with andwithout factoring in geophysical/physicochemical data are shown in thetable below:

Microbiome + networks Microbiome + networks + geophysical data variableImportance variable Importance Fun_Enriched.Transitivity 5.12Fun_Enriched.Transitivity 2.17 PC3 3.41 CLAY 1.85Fun_Depleted.Coexclusion_proportion 3.07Fun_Depleted.Coexclusion_proportion 1.70 PC1 1.45 ECEC 1.50 Location1.22 PH 1.22 Bac_Depleted.Coexclusion_proportion 1.08 PC3 1.17Bac_Enriched.Coexclusion_proportion 1.08 TN 1.03 Fun_Enriched.Modularity1.07 Bac_Enriched.Coexclusion_proportion 1.01 PC50 0.80 PC1 0.97 PC20.70 Bac_Depleted.Coexclusion_proportion 0.93

Incorporation of soil physicochemical properties thus improvespredictive power of models, and variations of the model can further beadapted to process additional factors (e.g., weather), a reduced numberof factors, and/or different combinations of factors to generate furtherimproved predictions of yield.

2.4.4 Crop Feature—Yield Example 2

In a variation of the example methods described, the method was adaptedto further process geophysical metadata/physicochemical data to improveaccuracy of predictions of yield (e.g., from ˜79% to ˜86%). In moredetail, the model processed the geolocation of each sample and extracted(e.g., from one or more databases) soil physicochemical data associatedwith location. In the example, geophysical measurements closest to thesampling point for each sample were averaged; however, in othervariations, geophysical metadata/physiochemical data can be acquired atthe sampling site (e.g., through direct sampling, through extraction ofinformation from databases, etc.). Processing of geophysicalmetadata/physicochemical data in addition to microbiome and network dataimproved predictions of yield significantly, where relative importancefactors (based on the Gini index) from variations of the models with andwithout factoring in geophysical/physicochemical data are shown in thetable below:

Microbiome + networks Microbiome + networks + geophysical data variableImportance variable Importance Fun_Enriched.Transitivity 5.12Fun_Enriched.Transitivity 2.17 PC3 3.41 CLAY 1.85Fun_Depleted.Coexclusion_proportion 3.07Fun_Depleted.Coexclusion_proportion 1.70 PC1 1.45 ECEC 1.50 Location1.22 PH 1.22 Bac_Depleted.Coexclusion_proportion 1.08 PC3 1.17Bac_Enriched.Coexclusion_proportion 1.08 TN 1.03 Fun_Enriched.Modularity1.07 Bac_Enriched.Coexclusion_proportion 1.01 PC50 0.80 PC1 0.97 PC20.70 Bac_Depleted.Coexclusion_proportion 0.93

Incorporation of soil physicochemical properties thus improvespredictive power of models, and variations of the model can further beadapted to process additional factors (e.g., weather), a reduced numberof factors, and/or different combinations of factors to generate furtherimproved predictions of yield.

2.4.5 Crop Feature—Crop Nutrient Characteristics Example 1

Variations of the models described were further adapted for returningpredictions of a crop's nutrient data (e.g., potato plant petiolenutrient data) based upon a limited number of input features (e.g., soilmicrobiome-derived features, soil physicochemical features). In moredetail with respect to one example, samples were processed from 17locations across the U.S., and models returned petiole nutrient dataindicating plant nutritional statuses, which serves as a proxy for yieldand other crop-associated features.

In more detail, as shown in FIG. 8A, the model processed 16S and ITS OTUcount data for 17 locations at two time points, which demonstratedstrong effects of location on beta diversity. However, alpha diversity(determined from 16S and ITS OTU data) was significantly differentacross certain locations, but not significantly different with respectto other groupings, as shown in FIG. 8B. Then, as shown in FIG. 8C, themodel was structured to return petiole nutrient data for a number (e.g.,102) petiole nutrient data observations from 17 locations, where FIG. 8Cdepicts part per million (PPM) values of various nutrients (e.g.,phosphorus, calcium, potassium, zinc, nitrogen, etc.) converted topercentages for each location. In generating the outputs of FIGURE D, itwas observed that treatment effects are subtle (e.g., treatment effectsare more appreciable in some locations (e.g. 1 and 16), where treatedsamples have roughly higher K values). Furthermore, treatment may alsobe “acting through” other variables, such as network properties.Furthermore, as shown in FIG. 8C, a single component of a percentagecomposition cannot be changed without affecting additional percentages.The nutrient data variables were then transformed by the model intolog-ratio values with a transformation operation that changed the scaleof the petiole nutrient concentrations from 0-100% to −inf to +inf(i.e., the whole number range), as shown in FIG. 8D. In the example, thetransformation operation processed ratios between concentration/relativeabundance values for a particular nutrient and other nutrient valuesiteratively, with subsequent logarithmic scaling, to adjust the scale inFIG. 8D to [−inf, +inf].

Then, as shown in FIG. 8E, the model performed multidimensional scalingfollowed by unsupervised clustering of the data to identify clusters ofsamples, where clusters included samples from single or multiplelocations. The model then processed: soil microbiome features (e.g.,counts, network properties), texture, and chemical data from multipletime points to generate predictions of petiole nutrient data (and/ordata associated with other crop parts), with training of the model basedupon a training dataset of 70% of data and a test dataset of 30% ofdata. The method implemented a multivariate Gaussian LASSO-Ridgeregression to model the petiole log-ratios against the input data, wheremicrobiome counts (i.e., 16S and ITS OTU counts) were firstCLR-transformed and projected onto a 50-column matrix using classicalPCA. Soil chemical and texture data was transformed using a similarprocedure as the petiole data. Regression coefficients/fold-changes formultiple time points for each nutrient of interest are shown in FIG. 8F,based on models trained with the training dataset. Nutrients/factors ofinterest include, but are not limited to: fungi depleted modularity,bacteria enriched modularity, bacteria enriched transitivity, PC15,clay, PC22, PC1, PC7, zinc, PC13, sulfates, PC20, PC25, PC3, potassium,PC17, PC23, PC9, PC10, calcium, PC16, PC8, PC21, PC14, sand, bacterialdepleted transitivity, manganese, boron, PC18, PC12, PC6, bacterialdepleted modularity, and magnesium (where PC components includecompositional components of interest).

In FIG. 8F, the coefficients on the y-axis are fold-changes or“multipliers”, where the horizontal line at 1 represents no change,below 1 is a reduction, above one is an increase. In particular, fromFIG. 8F, the regression coefficients for T0 and T1 data are similar, butnot identical (e.g., for phosphorus, the effect of bacterial enrichedtransitivity is reversed between timepoints). Furthermore, networkproperties were demonstrated to have global, but not always similar,effects on petiole data variables, but contributed significantly topredictive power in estimating petiole nutrient characteristics andyield.

FIG. 8G depicts predictions of coefficients upon processing the testdataset (e.g., with LASSO-Ridge predictions demonstrating 72-92%accuracy in most locations), where the solid line represents ensembledata and the dashed lines represent predicted To values and Ti values,respectively. In particular, with respect to processing the testdataset, locations were labeled in the outputs but were not consideredas primary factors in generating the predictions. In FIG. 8G, all valueson the y-axis are log-ratios (i.e., log 2(element/others)), and thepoints in FIG. 8G depict the actual values, while lines depictpredictions from T0 and T1 respectively. Unexplained outcomes (e.g., inrelation to nutrient predictions) can be used for cross-validation ofselection of location of sampling in iterations of model development andrefinement. The table below depicts percentages of variation explained(i.e., R2 values) by the model for each Petiole nutrient elementassociated with the test set:

Petiole element Ensemble Predicted T0 Predicted T1 P 87.70% 88.25%87.66% Ca 92.18% 89.71% 92.08% K 92.37% 88.07% 89.68% Zn 89.25% 82.45%81.41% N 76.37% 72.41% 75.43%

As such, as shown in FIG. 8G, the microbiome is not only capturing thelocal microbial “signature”, but in conjunction with network propertiesand soil chemical properties, it is an excellent predictor of petiolechemical composition.

To return insights related to relationships between predictors andpredicted variables, the method then included steps for fitting amulti-level Bayesian model (shown in FIG. 8H) with the followingarchitectural aspects configured to ensure a good fit: a baseline wasfit for each petiole element for each location, the most importantcoefficients from the previous regression models (e.g., networkproperties) were allowed to vary across timepoints, and all networkproperties and microbiome principal components are allowed to interactwith the treatment variable (i.e. the model was configured to assess ifthe treatment is influencing the coefficients for network properties).Variations of this complex model can implement a subset of the variablesor a different set of variables to improve fit. The table below depictspercentages of variation explained (i.e., R2 values) by the model foreach Petiole nutrient element associated with the dataset:

Petiole element Fitted R2 P 93.70% Ca 97.29% K 99.27% Zn 97.85% N 95.56%

FIG. 8I depicts predictions for petiole nutrients (e.g., P, Ca, K, andZn) for additional samples, where Potassium and Phosphorous nutrientcharacteristics were demonstrated to have negative correlations withyield, and Calcium and Zinc were demonstrated to have positivecorrelations with yield. FIG. 8J depicts predicted ratios of variousnutrients, as well as yield predictions, with respect to treatment andcontrol groups at various locations.

FIG. 8K depicts results of applying three treatments configured forincreasing phosphorous solubilization in soil, with returned outputsdepicting ratios of various nutrients in response to the treatments, asa proxy for yield (e.g., in terms of average tons/hectare).

Variations of the models described for characterizing nutrient dataand/or yield predictions can be structured in another suitable manner(e.g., in relation to model architecture, in relation to input variablesprocessed, etc.). For instance, extensions of described models can beconfigured to process physicochemical data to return results evaluatingthe importance of different features in all models, so we can orderlocations by their “importance” to the predictive power of the models.Such analyses can then be used to rationalize a set of desired or“ideal” soil physicochemical properties and/or weather conditions forwhich to efficiently evaluate products, other agricultural inputs,and/or other management practices.

2.4.6 Crop Feature—Crop Nutrient Characteristics Example 2

Variations of the models described were further adapted for returningpredictions of a crop's nutrient data (e.g., wheat nutrient data) basedupon a limited number of input features (e.g., soil microbiome-derivedfeatures, soil physicochemical features). Different managementconditions (e.g., till versus no-till soils) were also considered inrelation to nutrient and yield predictions. The example method for wheatanalysis thus includes a comparison between the status of the soilduring different time-points, comparing the evolution of the microbiomeof treated (tilled) and untreated (no-till) blocks in several locations.Altogether, this approach limits the impact of the environmental changesin the analysis, and identifies the specific changes caused by the typeof soil management in the microbial community.

To characterize both bacterial and fungal microbial communitiesassociated with soil samples, the method includes amplification andsequencing of the 16S rRNA (for prokaryotes) and ITS (for fungi) markergenes (e.g., as described above) for samples (e.g., samples acquiredaccording to variations described above with methods for preventingcross contamination) at multiple geographical locations (e.g., 3locations) and multiple time points (e.g., 5 time points). The methodthen includes phylogenetic assignment of each sample based on an averageof 300,000 high-quality raw sequencing reads against a taxonomicallyclassified sequence database. The method also includes computation offunctional and ecological indexes according to embodiments, variations,and examples of methods described above, with soil physical/chemicalproperties evaluated (e.g., using Waypoint Analytical's Mehlich 3Extraction) with a suitable buffer (e.g., pH, buffer pH, P, K, Mg, Ca,organic matter, CEC, % cation saturations) and B, S, Fe, Mn, Cu, Zn andNa. The method further includes steps for providing individual reportper sample as well a guide for evaluating results and implementingrecommendations. Once physicochemical properties as well as taxonomic,functional and ecological indexes are obtained for all samples, themethod can be used to evaluate potential relationship between the soil'sinitial characteristics, the microbiome, and the nutritional quality ofthe wheat. As such, similar to methods described above, the method herecan be adapted to use classic statistical, Bayesian, and/or machinelearning models based on the physicochemical and biological propertiesof soil to predict or correlate it to nutritional quality of a wheatspecies (e.g., Triticum aestivum L).

Variations of the methods described can further be configured toevaluate the carbon sequestration capabilities at an agricultural site(e.g., using methods adapted from those above in relation to yield andnutrient characterizations and predictions). As such, the adaptedmethods can be used to benefit land managers in relation to obtainingsoil carbon credits from their lands, predicting shelf life of cropsbased on the soil microbiome, and/or in relation to other suitablebenefits.

2.5 Methods—Insights and Interventions

Step S150 recites: executing an action for producing a desired outcomein relation to the agriculture site, with respect to a specific soiltype and a specific crop, based upon the analysis. Step S150 functionsto process outputs of prior steps in order to generate insights and/orexecute actions that can improve productivity, correct issues, and/orincrease sustainability of practices at the agriculture site(s) beingassessed. In particular, agricultural inputs and management practicescan have inconsistent field performance with uninformed application,where, in relation to some inputs, different strains and species canhave different functional performance under specific environmental andecological conditions. As such, Step S150 can provide agriculturalinputs and implement management practices in an informed manner that istargeted to specific crops, soil types, and/or environmental conditions.

In variations, executing the action can include generating digitalobjects encoding instructions for controlling apparatus associated withan operator managing the agriculture site. In variations, executedactions can include or be associated with one or more of: maintaining astatus of an agriculture site by providing guidance for maintainingcurrent management statuses and/or products used; responding to an issuedetected at the agriculture site(s) being assessed (e.g., in relation topathogen presence or increased abundance of a detrimental microorganism,in relation to decreased abundance of a beneficial microorganism, inrelation to correcting a perturbation, in relation to adjustingapplication of a product at the agriculture site, implementingprotective measures against environmental effects, etc.); responding toor otherwise correcting other undesired statuses at one or moreagriculture sites being monitored; providing information regarding sitecharacteristics to a manager/operator/other entity associated with theagriculture site(s); performing decision-making guidance (e.g., inrelation to analyses indicative of sustainability of practices, inrelation to long term effects of use of one or more products, etc.); andperforming other suitable actions.

In generating recommended actions, step S150 can include returningnotifications or other information derived from the analyses and otheroutputs of step S140 in a visual format, in an audio format, in a hapticformat, and/or in any other suitable observable format, to a manager,operator, and/or other entity associated with the agriculture site(s)being assessed. As such, variations of Block S150 can include generatingdigital objects (e.g., in visual data formats, in audio data formats, inhaptic data formats, encoding information) or instructions forgenerating digital objects, in communication with client devices (e.g.,mobile devices or other devices that are associated with a manager,operator, and/or other entity associated with the agriculture site(s)),where the client devices include visual output components (e.g., adisplay), audio output components (e.g., speaker), haptic outputcomponents (e.g., vibrators), and/or any other suitable components.Client devices can also include input components (e.g., keypads, touchdisplays, microphones, joysticks, mice, etc.) such that the managers,operators, or other entities associated with the agriculture site(s) cancommunicate inputs (e.g., commands) related to the generated analyses.

Additionally or alternatively, generating recommended actions caninclude generating control instructions for apparatus (e.g., machinery,robotic apparatus configured to traverse an agricultural site, otherapparatus) configured to execute computer-readable instructions formanagement of the agriculture site(s).

In variations, control instructions can involve instructions forcontrolling operation modes of one or more of: watering subsystems(e.g., in relation to water distribution through conduits and/orsprinklers to the agriculture site(s)); product delivery subsystems incommunication with watering subsystems (e.g., delivery subsystems incommunication with watering subsystems through fluidic components,valves, etc.); robotic crop handling subsystems (e.g., in relation toremoval of pathogen-affected crop portions); robotic crop pickingsubsystems (e.g., in relation to automated harvesting at optimal timeperiods in relation to improving production, in relation to efficiencyof new production generation post-harvesting, in relation tominimization of wasted product, etc.); robotic nutrient delivery orpesticide delivery subsystems (e.g., in relation to initiating delivery,in relation to stopping delivery, in relation to adjusting frequency ofdelivery, in relation to adjusting delivery dosages, etc.); greenhousesubsystems; temperature control subsystems (e.g., in relation to modesfor controlling environmental temperature of the agriculture site,etc.); light control subsystems (e.g., in relation to modes forcontrolling environmental light of the agriculture site, in relation totransitioning between on and off states, in relation to light spectrumdelivered, etc.); gas environment subsystems (e.g., in relation to modesfor controlling environmental gas composition of the agriculture site,etc.); humidity control subsystems (e.g., in relation to modes forcontrolling environmental humidity levels of the agriculture site,etc.); pressure control subsystems (e.g., in relation to modes forcontrolling environmental pressure of the agriculture site, etc.); andother suitable subsystem(s) of the agriculture site(s). Additionally oralternatively, step S150 can include generation of control instructionsfor automated vehicle platforms associated with controlling vehiclesassociated with the agriculture site(s), with respect to surveying,management, and/or other operation modes.

In examples, instructions for controlling operation modes of wateringsubsystems (e.g., in relation to water distribution through conduitsand/or sprinklers to the agriculture site(s)) can be automaticallyexecuted in response to detected states of undesired watering levelsbased upon model outputs from other steps of the method. As such,controlling operation modes can include transitioning the wateringsubsystems between various states of flow, on-off states, etc. Controlcan be modulated in relation to constraints associated with water usage(e.g., times of drought, in relation to water usage incentives, etc.).

In examples, instructions for controlling operation modes of productdelivery subsystems (e.g., delivery subsystems in communication withwatering subsystems through fluidic components, valves, etc.) can beautomatically executed in response to detected states of undesiredsupplement levels based upon model outputs from other steps of themethod. As such, controlling operation modes can include transitioningthe delivery subsystems between various states of product dosage, flowrates, on-off states, etc.

In examples, instructions for controlling operation modes of roboticcrop handling subsystems (e.g., in relation to removal ofpathogen-affected crop portions), robotic crop picking subsystems (e.g.,in relation to automated harvesting at optimal time periods in relationto improving production, in relation to efficiency of new productiongeneration post-harvesting, in relation to minimization of wastedproduct, etc.), robotic nutrient delivery or pesticide deliverysubsystems (e.g., in relation to initiating delivery, in relation tostopping delivery, in relation to adjusting frequency of delivery, inrelation to adjusting delivery dosages, etc.), and/or other roboticsubsystems can be automatically executed in response to detected statesof harvesting time, pathogen detection, nutrient states, pest presence,and/or other factors based upon model outputs from other steps of themethod. As such, controlling operation modes can include transitioningthe robotic subsystems between various states of actuation.

In examples, instructions for controlling operation modes of greenhousesubsystems, temperature control subsystems (e.g., in relation to modesfor controlling environmental temperature of the agriculture site,etc.), light control subsystems (e.g., in relation to modes forcontrolling environmental light of the agriculture site, in relation totransitioning between on and off states, in relation to light spectrumdelivered, etc.), gas environment subsystems (e.g., in relation to modesfor controlling environmental gas composition of the agriculture site,etc.), humidity control subsystems (e.g., in relation to modes forcontrolling environmental humidity levels of the agriculture site,etc.), pressure control subsystems (e.g., in relation to modes forcontrolling environmental pressure of the agriculture site, etc.),and/or other environmental control subsystems can be automaticallyexecuted in response to detected states of environmental conditionssuited to or unsuited for desired outcomes, and/or other factors basedupon model outputs from other steps of the method. As such, controllingoperation modes can include transitioning the environmental controlsubsystems between various states of temperature control, light control,gas control, humidity control, pressure control, and/or otherenvironmental control. Control can be modulated in relation toconstraints associated with power usage (e.g., times of peak demand, inrelation to demand incentives, etc.).

Additionally or alternatively, step S150 can include generation ofcontrol instructions for automated vehicle platforms associated withcontrolling vehicles associated with the agriculture site(s), withrespect to surveying, management, and/or other operation modes.

Step S150 can include or be associated with executing the recommendedaction S151 through electronic communication with one or more subsystemsdescribed above, which functions to automatically execute recommendedactions in order to reduce operator workload in relation to agriculturesite management. Executed actions can include or be associated with oneor more of: maintaining a status of an agriculture site by providingguidance for maintaining current management statuses and/or productsused; responding to an issue detected at the agriculture site(s) beingassessed (e.g., in relation to pathogen presence, in relation todetrimental microorganism presence, in relation to correcting aperturbation, in relation to adjusting application of a product at theagriculture site, implementing protective measures against environmentaleffects, etc.); responding to or otherwise correcting other undesiredstatuses at one or more agriculture sites being monitored; maintainingor improving desired statuses at one or more agriculture sites beingmonitored (e.g., in relation to biocontrol microorganism presence, inrelation to stress tolerance microorganism presence, in relation toplant growth promoter microorganism presence, in relation to nutrientmetabolizing microorganism presence, etc.); providing informationregarding site characteristics to a manager/operator/other entityassociated with the agriculture site(s); performing decision-makingguidance (e.g., in relation to analyses indicative of sustainability ofpractices, in relation to long term effects of use of one or moreproducts, etc.); and performing other suitable actions, as describedabove in embodiments, variations, and examples of agriculture sitemanagement control and notification/report delivery. Embodiments,variations, and examples of actions are further described in U.S.application Ser. No. 17/119,972 filed 11 Dec. 2020, incorporated byreference above.

2.5.1 Agricultural Input Examples

In examples, agricultural inputs can include substances, microorganisms,or mixtures thereof (e.g., plant biostimulants) configured to promoteplant health and quality and recycling crop residues with lowenvironmental impact. Such inputs can include biofertilizers,biostimulants, biocontrol agents, agents for hormone production, agentsconfigured to promote stress adaptation, nutrients, and/or other inputs.Bioinoculants based on microorganisms, in particular, can includefunctional plant growth promoter species having a direct impact on planthealth and yield. In examples, such biostimulants can function by:improving plant growth, increasing root hairs development in aphytohormone-mediated process using an Azospirillum brasilense strain;by increasing the tolerance to abiotic stresses through the action of anACC deaminase produced by a Burkholderia unamae strain; by increasingplant growth by enhanced nutrient (P) acquisition (e.g., in cucumber andtomato plants) using a Bacillus sp. strain; by enhancing noduleformation by a two species consortia of Pseudomonas putida plusRhizobioum sp. (e.g., in beans); by improving grain yield (e.g., inrice) by increasing panicle number through the use of an Azospirillumamazonense strain; by improving nutrient (e.g., phosphorous)solubilization; and/or by another suitable method.

Additionally or alternatively, agricultural inputs can include microbialstrains that have indirect effect in soil and plant health, as tools forin situ microbiome engineering, promoting the development of otherbeneficial microbial species, improving the resistance of the microbiometo the invasion of plant pathogens (e.g., as in B. amyloliquefaciensQST713-a strain isolated from the soil of a Californian organic peachorchard with a demonstrated effective broad-spectrum bactericide andfungicide activity), having another function in affecting compositionand structure of native communities (e.g., as in allochthonous strains),and thus reducing the abundance of pathogenic species, therebyincreasing the resistance of the plant against diseases.

In the specific example provided above, B. amyloliquefaciensQST713-based biostimulants can be used to reduce the transitivity of theco-occurrence fungal network of the rhizosphere and bulk soil where itis applied through its biofungicide activity, but in a reversiblemanner, thereby improving yield in the short term, but maintaining crophealth in the long term. That is, effects of B. amyloliquefaciensQST713-based biostimulants can operate in a transient, reversible,and/or non-permanent manner (e.g., the fungal communities return totheir original stage post-harvest), thereby improving yieldsignificantly but not having a permanent or adverse effect upon crop orsite health.

2.5.2 Management Practice Examples

In examples, management practices can include one or more of:implementation of cover crops, conservation tillage, irrigationefficiency-associated methods, contour farming, implementation of wastestorage structures for animal waste, critical area planting, cropresidue management practices, crop rotation, diversion, forest harvestmanagement, use of grade stabilization structures, application ofgrassed waterways, use of high tunnels, implementation of integratedpest management, implementation of silvopasture, implementation of covergrazing, implementation of no-till farming, implementation of nutrientmanagement plans, roof runoff management, implementation of rotationalgrazing, implementation of vegetative filter strips to filter runoff andprevent contaminants from entering water sources, implementation offield borders, implementation of lined waterways, implementation ofriparian buffers, and/or other management practices.

As described above, management practices can be categorized or specificto one of: conventional management practices, organic managementpractices, and biodynamic management practices, and other suitable typesof management practices. As such, executing the action can includeimplementing one or more of: a conventional management practice, anorganic management practice, and a biodynamic management practice at theagriculture site.

2.5.3 Practical implementation Examples

Based on embodiments, variations, and examples of methods and models(e.g., trained models) described above, as shown in FIGS. 8A and 8B,step S150 can include receiving a dataset associated with one or more ofa crop type and an agricultural site (e.g., geolocation of theagricultural site) under evaluation S151 a and/or receiving a samplefrom the agriculture site under evaluation S151 b; processing at leastone of the dataset and the sample with an embodiment, variation, orexample of the model(s) described above S152; returning a prediction ofa crop-associated feature from the model(s) S153; and based upon theprediction, recommending or executing implementation of at least oneagricultural input or management practice at the agricultural site underevaluation S154.

In variations, S154 can extend to evaluating a set of agriculturalinputs and/or management practices in parallel within subportions of theagricultural site under evaluation by applying the set of agriculturalinputs and/or management practices in parallel at a set of subportionsof the agricultural site S155; processing samples acquired from the setof subportions S156; and returning outputs characterizingcrop-associated features and/or characteristics of the set ofsubportions, at a set of time points, in response to the set ofagricultural inputs and/or management practices S157, therebycharacterizing effectiveness of each of the set of agricultural inputsand/or management practices.

In one variation of step S151 a, an entity associated with the crop typeand/or agricultural site under evaluation can be prompted to provideinformation pertaining to a geolocation of the agricultural site and/orthe crop type the entity is interested in cultivating. Then, uponprocessing the information (e.g., using variations of models describedabove, by extracting physicochemical data and/or weather data fromdatabases based upon the geolocation, etc.) according to step S152, themethod can return a prediction of the crop-associated feature (e.g.,yield prediction, nutrient prediction, health prediction, etc.) for thecrop type and/or agricultural site under evaluation according to stepS153, and execute an action (e.g., providing product recommendations,implementing management practices, applying products targeted to thecondition of the crop/agricultural site, etc.) for achieving desiredoutcomes according to step S154. Examples of desired outcomes caninclude one or more of: an increase in yield, protection of one or morecrop types from disease, improving nutrition state or content of croptypes, increasing the shelf life of crop types, achieving agriculturalsite states that facilitate selling of soil carbon credits, and otheroutcomes.

In one variation of step S151 b, an entity associated with the crop typeand/or agricultural site under evaluation can be prompted to provideinformation pertaining to a geolocation of the agricultural site and/orthe crop type the entity is interested in cultivating, along with asample (e.g., derived from soil, derived from a crop portion, derivedfrom material produced from a crop, etc.). Then, upon processing theinformation (e.g., using variations of models described above, bygenerating microbiome composition and structure-associated features, byextracting physicochemical data and/or weather data from the sample(s)directly, etc.) according to step S152, the method can return aprediction of the crop-associated feature (e.g., yield prediction,nutrient prediction, health prediction, etc.) as well as informationpertaining to the sample and/or agricultural site (e.g., providingphysicochemical data for the agricultural site) according to step S153,and execute an action (e.g., providing product recommendations,implementing management practices, applying products targeted to thecondition of the crop/agricultural site, etc.) for achieving desiredoutcomes according to step S154. Examples of desired outcomes caninclude one or more of: an increase in yield, protection of one or morecrop types from disease, improving nutrition state or content of croptypes, increasing the shelf life of crop types, achieving agriculturalsite states that facilitate selling of soil carbon credits, and otheroutcomes.

In relation to evaluating agricultural inputs and/or managementpractices in parallel according to variations of steps S155-S157,variations of the method can include steps for performing field trialsof multiple agricultural inputs (e.g., biostimulants, biofertilizers,crop protection products, etc.) at one or more geolocations. at one ormore time points, and/or in association with one or more crop types(e.g., genotypes, cultivars). Data from evaluation of sets ofagricultural inputs and/or management practices can be used to generateadditional data (e.g., training data, test data) for refining modelsdescribed above.

In one example, the method can be used to evaluate agricultural inputscomprising phosphorus solubilizers, where processing of samples and/orother data from agricultural sites where the phosphorous solubilizerswere applied produced outputs indicative of effectiveness of thephosphorous solubilizers. Evaluation can be based upon microbiomecomposition, functional and structural features, as well as cropsymptoms determined directly. In the example, each product beingexamined in parallel was evaluated for effects on crop health (e.g., inrelation to brown spot, in relation to black pit, in relation tocharcoal rot, in relation to early blight, in relation to fusarium dryrot, in relation to fusarium wilt, in relation to gray mold, in relationto late blight, in relation to pink rot, in relation to pleosporaherbarum, in relation to silver scurf, in relation to verticillium wilt,etc.); in relation to hormone production (e.g., in relation to auxinproduction, in relation to cytokinin production, in relation togibberellin production, etc.); in relation to stress adaptation (e.g.,in relation to exopolysaccharide production, in relation to ACCdeaminase abundance, in relation to heavy metal solubilization, inrelation to salicylic acid production, in relation to salt tolerance, inrelation to abscisic acid production, in relation to siderophoreproduction, etc.); in relation to biocontrol behavior (e.g, in relationto fungicide agents, in relation to insecticide agents, in relation tonematicide agents, in relation to bactericide agents, etc.); in relationto nutritional pathways associated with major compounds associated withcarbon pathways (e.g., carbon fixation, aerobic respiration,fermentation, methanogenesis, organic matter release, etc.); in relationto nutrition associated with major compounds associated with nitrogenpathways (e.g., inorganic nitrogen release, inorganic nitrogenconsumption, inorganic nitrogen cycle health, etc.); in relation tonutrition associated with major compounds associated with phosphorouspathways (e.g., inorganic phosphorus solubilization, inorganicphosphorous consumption, organic phosphorous assimilation, etc.); inrelation to nutrition associated with major compounds associated withpotassium pathways (e.g., potassium solubilization, potassiumconsumption, etc.); in relation to nutrition associated with majorcompounds associated with other pathways; in relation to othermicronutrient factors (e.g., iron assimilation, zinc transportequilibrium, manganese transport equilibrium, sulfur cycle equilibrium,calcium transport, copper export, magnesium transport, chlorinetransport, etc.); and/or other factors.

Variations of the methods described also evaluated soil microbiomeagronomic indices (e.g., as described above) that improved the most uponapplication of each product, in relation to metrics and categories ofevaluation described above.

Variations of the methods can, however, be implemented in anothersuitable manner or drive other suitable outcomes.

3. System

As shown in FIG. 9, a system 200 for characterization and improvement ofan agricultural site includes: one or more sample reception subsystems210; one or more sample processing subsystems 220 in communication withthe sample reception subsystems 210; a computing platform 230 comprisingone or more processing subsystems comprising non-transitorycomputer-readable medium comprising instructions stored thereon, thatwhen executed by the processing subsystems perform one or more steps ofmethods described above; and one or more action execution subsystems 240configured to execute actions informed by processes of the computingplatform 230. In variations, the action execution subsystems 240 can beconfigured to execute control instructions generated by the computingplatform 230, where control instructions can involve instructions forcontrolling operation modes of one or more of: watering subsystems(e.g., in relation to water distribution through conduits and/orsprinklers to the agriculture site(s)); product delivery subsystems incommunication with watering subsystems (e.g., delivery subsystems incommunication with watering subsystems through fluidic components,valves, etc.); robotic crop handling subsystems (e.g., in relation toremoval of pathogen-affected crop portions); robotic crop pickingsubsystems (e.g., in relation to automated harvesting at optimal timeperiods in relation to improving production, in relation to efficiencyof new production generation post-harvesting, in relation tominimization of wasted product, etc.); robotic nutrient deliverysubsystems (e.g., in relation to initiating delivery, in relation tostopping delivery, in relation to adjusting frequency of delivery, inrelation to adjusting delivery dosages, etc.); greenhouse subsystems;temperature control subsystems (e.g., in relation to modes forcontrolling environmental temperature of the agriculture site, etc.);light control subsystems (e.g., in relation to modes for controllingenvironmental light of the agriculture site, in relation totransitioning between on and off states, in relation to light spectrumdelivered, etc.); gas environment subsystems (e.g., in relation to modesfor controlling environmental gas composition of the agriculture site,etc.); humidity control subsystems (e.g., in relation to modes forcontrolling environmental humidity levels of the agriculture site,etc.); pressure control subsystems (e.g., in relation to modes forcontrolling environmental pressure of the agriculture site, etc.); andother suitable subsystem(s) of the agriculture site(s).

Embodiments of the system 200 are configured to perform one or moreportions of methods described above; however, variations of the system200 can be configured to perform other suitable methods.

4. Conclusions

The invention(s) decipher different ecological strategies thatbacterial, fungal, and/or other organism communities adopt in face ofdifferent levels of farming intensification and product use, and explorehow these may impact soil health in terms of external factors and plantpathogens. In applications, outputs of the invention(s) can guideinterventions and/or other practices to improve agriculture sites, asobserved community assembly strategies. In examples, a collaborativewell-mixed habitat in soils under biodynamic management with potentiallyhigher resistance towards, at least, pathogen loads, or a more dividedhabitat, with fungi belonging to more niches but with lower reactionrange to pathogen loads in soils under conventional management. Underthis framework, the inventions have practical applications withrelevance for agriculture sustainability, and with respect tointerventions that can be designed to drive a better future foragro-ecosystems. For instance, evaluating how emergent properties changeduring time-series, may give clear indications about the resistance andresilience of fungal communities, or shed light into the dynamics ofsoils under different anthropogenic disturbances. For now, the definedecological emergent properties may be used as biomarkers to measure theeffect of farming practices or temperature change consequences in thehealth status of soils. Given the key role that microorganisms play inagri-food systems in general, and in crop yield in particular, thesefindings are useful for establishing monitoring programs ofcrop-associated microbial diversity, supporting the work of alliancessuch as the soil health institute the U.S. department of agriculture, orthe global initiative of crop microbiome and sustainable agriculture,while promoting soil healthiness through agriculture sustainablestrategies.

The FIGURES illustrate the architecture, functionality and operation ofpossible implementations of systems, methods and computer programproducts according to preferred embodiments, example configurations, andvariations thereof. In this regard, each block in the flowchart or blockdiagrams may represent a module, segment, or portion of code, whichcomprises one or more executable instructions for implementing thespecified logical function(s). It should also be noted that, in somealternative implementations, the functions noted in the block can occurout of the order noted in the FIGURES. For example, two blocks shown insuccession may, in fact, be executed substantially concurrently, or theblocks may sometimes be executed in the reverse order, depending uponthe functionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts, or combinations of special purpose hardware andcomputer instructions.

As a person skilled in the art will recognize from the previous detaileddescription and from the figures and claims, modifications and changescan be made to the preferred embodiments of the invention withoutdeparting from the scope of this invention defined in the followingclaims.

What is claimed is:
 1. A method for evaluating and predicting a set ofcrop-associated features at an agriculture site, the method comprising:receiving a set of samples associated with the agriculture site;generating a sample dataset upon processing the set of samples with aset of sample processing operations; generating a set ofmicrobiome-associated features upon performing a set of transformationoperations upon the sample dataset, wherein performing the set oftransformation operations comprises generating a first grouping ofpositive pairs of organisms and a second grouping of negative pairs oforganisms represented in the sample dataset, and generating a networkproperty dataset upon transforming the first grouping of positive pairsof organisms and second grouping of negative pairs of organisms into aset of aggregate matrices representing co-inclusion and co-exclusion oforganisms across a metacommunity represented in the sample dataset; andreturning an analysis characterizing the set of crop-associated featuresbased upon the set of microbiome-associated features.
 2. The method ofclaim 1, wherein the set of agriculture samples comprise at least oneof: a soil sample, a root sample, a foliage sample, a liquid sample, anda crop-derived sample.
 3. The method of claim 1, wherein the set ofsample processing operations comprises: a nucleic acid extractionoperation; a library preparation operation upon amplification andsequencing of target regions comprising a 16S rRNA V4 region and an ITS1region to generate a set of identified sequences; a filtering operationwith filtering of chimera sequences and clustering of non-singletonsequences of the set of identified sequences; and a mapping of sequencesoutput by the filtering operation to operational taxonomic units (OTUs)with an identity threshold.
 4. The method of claim 3, wherein the set ofsample processing operations further comprises performing an ampliconsequence variant operation involving: clustering of the set ofidentified sequences into a set of clusters; and identifying, for eachof the set of clusters, a centroid sequence representing a most abundantsequence of a respective cluster of the set of clusters.
 5. The methodof claim 4, wherein clustering the set of identified sequences comprisesgrouping sequences of the set of identified sequences having adifference in nucleotides satisfying a threshold condition.
 6. Themethod of claim 1, wherein the set of sample processing operationsfurther comprises tagging sequence data of the sample dataset withcontextual data comprising geographic location information,meteorological metadata, climatic information, and management practiceinformation.
 7. The method of claim 1, wherein the set of sampleprocessing operations further comprises tagging sequence data of thesample dataset with a set of metacommunity descriptors corresponding toa set of communities within a same habitat associated with theagriculture site.
 8. The method of claim 1, wherein performing the setof transformation operations comprises extracting a set of networkfeatures characterizing community assembly associated with organismsrepresented in the sample dataset.
 9. The method of claim 8, wherein theset of microbiome-associated features generated upon performing the setof transformation operations comprises: a number of connected componentsrepresented within a subnetwork of organisms, and a modularity factor.10. The method of claim 8, wherein the set of microbiome-associatedfeatures generated upon performing the set of transformation operationsfurther comprises: a clustering coefficient, an average path lengthbetween network components, an assortativity factor representinghomophyly of a network, a proportion co-inclusion factor normalized to atotal number of combinations of all OTUs in the set of samples; and aproportion co-exclusion factor normalized to a total number ofcombinations of all OTUs in the set of samples.
 11. The method of claim8, wherein the set of microbiome-associated features is derived fromcombinations of p-hypergeometric (PH) network properties and Bayesianfactor (BF) network properties.
 12. The method of claim 8, whereingenerating a network property dataset comprises characterizingcommensalism factors, facilitation factors, mutualism factors,antagonism factors, competition factors, neutralism factors, andamensalism factors, represented in the sample dataset associated withorganisms of the agriculture site.
 13. The method of claim 1, whereinreturning the analysis comprises generating a set of outputscorresponding to crop-associated features of the agriculture site. 14.The method of claim 13, wherein said crop-associated features comprise:yield characteristics; crop health and disease states; crop agecharacteristics comprising lifespan, cycles of productivity, vegetativegrowth state; and crop shelf life.
 15. The method of claim 1, whereinreturning the analysis comprises generating a set of outputscharacterizing agriculture site statuses associated with nutritionalcomposition features of soil at the agriculture site.
 16. The method ofclaim 1, wherein returning the analysis comprises generating a set ofoutputs characterizing responses to a set of perturbations associatedwith the agriculture site.
 17. The method of claim 16, wherein the setof perturbations comprise: use of various products, environmentalperturbations, agricultural inputs, and management practices.
 18. Themethod of claim 1, wherein generating the set of microbiome-associatedfeatures comprises: generating a set of training data streams from a setof training samples at the agriculture site, the training data stream;applying one or more of the set of transformation operations to the setof training data streams corresponding to emergent properties, communityproperties, taxonomic properties, and functional properties inassociation with inputs and practices at the agriculture site; creatinga training dataset derived from the set of training data streams and theset of transformation operations; and training a machine learning modelcomprising architecture for returning at least one of the set ofmicrobiome-associated features and the analysis, in one or more stages,based upon the using the training dataset.
 19. The method of claim 1,wherein the set of samples is collected across a set of time points, andwherein returning the analysis comprises returning characterizations ofevolving population dynamics within fungal populations at theagriculture site, based upon alpha diversity and beta diversitypatterns.
 20. The method of claim 1, wherein the set of samples iscollected contemporaneously with application of a bioinoculant at theagriculture site, and wherein returning the analysis comprises returninga yield characterization of crops at the agriculture site in response toapplication of the bioinoculant.